Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

S1 ln131670341500845092-1939656818 hwf-1755078896idv12788715613167034pdf-hi0001

457 views

Published on

Published in: Education
  • Be the first to comment

  • Be the first to like this

S1 ln131670341500845092-1939656818 hwf-1755078896idv12788715613167034pdf-hi0001

  1. 1. Journal of the American Medical Informatics AssociationCo nf A corpus-based approach for automated LOINC mapping id Journal: Journal of the American Medical Informatics Association Manuscript ID: amiajnl-2012-001159.R1 en Article Type: Research and Applications automated mapping, LOINC, local laboratory tests, health information Keywords: t exchange, supervised machine learning, information retrieval ia l: Fo rR ev ie w On ly http://mc.manuscriptcentral.com/jamia
  2. 2. Page 1 of 28 Journal of the American Medical Informatics Association12345678 Co9101112 nf131415 id161718 en19202122 t23 ia2425 Figure 1: Growth in unique LOINCs mapped to local terms and unique words in local term descriptions as the26 number of local terms in the corpus has expanded over time. l:2728 Fo29303132 rR3334353637 ev383940 ie414243 w444546 On4748495051 ly525354555657585960 http://mc.manuscriptcentral.com/jamia
  3. 3. Journal of the American Medical Informatics Association Page 2 of 2812345678 Co9101112 nf131415 id161718 en19202122 t23 ia2425 Figure 2: Results of 20 iterations of repeated random sub-sampling validation showing the percentage of26 l: test terms with manually mapped LOINCs ranked first (top one) and among the top five by Maxent and27 Lucene.28 Fo29303132 rR3334353637 ev383940 ie414243 w444546 On4748495051 ly525354555657585960 http://mc.manuscriptcentral.com/jamia
  4. 4. Page 3 of 28 Journal of the American Medical Informatics Association12345678 Co9101112 nf131415 id161718 en19202122 t23 ia2425 Figure 3: Rank of correct LOINCs and their Maxent score for local laboratory terms from three test26 l: institutions.2728 Fo29303132 rR3334353637 ev383940 ie414243 w444546 On4748495051 ly525354555657585960 http://mc.manuscriptcentral.com/jamia
  5. 5. Journal of the American Medical Informatics Association Page 4 of 2812345678 Co9101112 nf131415 id161718 en19202122 t23 ia2425 Figure 4: Performance of Maxent and Lucene when applying the test set against a growing corpus of local26 l: terms.2728 Fo29303132 rR3334353637 ev383940 ie414243 w444546 On4748495051 ly525354555657585960 http://mc.manuscriptcentral.com/jamia
  6. 6. Page 5 of 28 Journal of the American Medical Informatics Association123456 Response to Reviewers’ Comments7 The following are respectfully submitted in response to the reviewers’ comments. The comments were8 Co appreciated and the authors hope that all issues and suggestions are adequately addressed.91011 Associate Editor12 nf13 Three expert reviewers have commented on your paper. We apologize for the delay in getting these14 reviews to you.15 id16 While there is support for the direction of your work and the methods applied, two of the three1718 reviewers have offered comments and suggestions which add up to a need for significant revision of en19 your paper. We ask that you address all of these comments. Reviewer 1, in particular, has asked that20 you make the paper easier to comprehend and seeks clarification of your "gold standard". Reviewer 2 is2122 asking you to round out your paper into a more scientific work by grounding your methods in what t23 other researchers have done and specific choices you made in the conceptualization of your study. ia2425 Reviewer: 126 l:27 However, I think the paper misses the opportunity to inform the readers more specifically how they can28 take advantage of this work. Is the repository of local terms available to others? Are the analyzing Fo2930 programs available to others to use, rather than reinvent? With the exception of RELMA mapping31 assistant {Lab Auto Mapper], there was no discussion of sharing.32 rR3334 [Author Response]3536 We agree that it would be helpful to clarify and expand our discussion of the generalizability and37 practicality of our approach. We have significantly revised the Discussion and added a specific section ev38 about practical application.3940 What the paper told me primarily was the result of what you did with 3 institutions. The three ie4142 institutions are referred to in the Abstract as “novel” institutions. I have no idea what that means or why43 you used it. Within the university, you refer these as test institutions. As the results were different w44 across the three test institutions, I would, at a minimum, want to know some of the characteristics of4546 each institution. Where the large or small; were they academic medical centers or rural clinics? I am On47 surprised that you did not identify the institutions – I see no invasion of privacy or conflict of interest.48 Then, the reader would at least have some comparison basis.495051 [Author Response] ly5253 We have made edits throughout the paper and now consistently refer to these 3 institutions as “test54 institutions” and their terms as test sets. We agree that the analysis based on the 3 institutions is the55 most applicable to the typical use case - mapping a new institution’s terms to LOINC. The 80/20 split5657585960 http://mc.manuscriptcentral.com/jamia
  7. 7. Journal of the American Medical Informatics Association Page 6 of 28123 analyses and characterization of model performance by simulating the corpus expansion over time were45 additional ways to investigate the robustness of this corpus-based approach.67 We have added the details that the test institutions were community hospitals in central Indiana. The8 INPC participation agreement stipulates that the data may not be used for research that directly Co910 compares institutions. Our research project was approved by the IRB and the INPC management11 committee, but as a courtesy in the spirit of that clause we opted to mask the specific identity of the12 institutions. We have used this same approach in several previously published mapping studies. nf1314 I found the paper difficult to read. I really had to work hard to understand what you were doing; some of15 that may be me, but I think you could have made it easier. First, I would assume many of your readers id1617 would not know the details of Apache’s Open NLP Maxent. A short description of what it does would18 significantly help an understanding of how it was used in your research. I would add even more detail en1920 about how you used Lucene, along with a brief description. Did you use Lucene to pull out individual21 words in a local term and then match and score those words against the text of the LOINC code? Did you22 t make an initial pass to pull out terms that were a direct match with the LOINC terms? I think just a little23 ia24 more detail would make it much clearer what you did.2526 [Author Response] l:2728 We have made significant revisions to address your concerns for clarity and additional explanation. Fo29 Specifically, we added descriptions of Lucene and Maxent and how the models were constructed. We3031 also added an entire section illustrating the formation of these two models and their scores using a32 short example corpus. We appreciate this reviewer’s and the other reviewer’s call for examples and rR33 think that their inclusion significantly improves the readability of the paper.3435 Did you consider looking at the axes other than name in matching local to LOINC codes? I think if you3637 included some examples of the kind of things in the corpus of terms, in the local terms, and show ev38 specifically the result of some matches that worked and some that different would be very useful.3940 [Author Response] ie4142 As noted above, we did include some examples showing the models in action. We also clarified in43 w44 several places (most notably the Discussion) that our approach relies exclusively on the rich corpus of45 local term descriptions and does not directly reference the LOINC terminology. An advantage of this46 method is that it does not require domain specific techniques or data. For example, we don’t have to On4748 detect that a particular token is the analyte being measured or the specimen. We have considered (and49 participated in other research) that does leverage additional semantics (such as the units of measure)50 and data (e.g. average result values, whether the test is performed more often on males or females,51 ly52 etc). We think this is also a worthy line of research, but the focus of this analysis was on a simple,53 corpus-based approach.5455 I was really confused with the numbers and content of Table 1. How does the text above the table relate5657 to the table? The percentages do not match.585960 http://mc.manuscriptcentral.com/jamia
  8. 8. Page 7 of 28 Journal of the American Medical Informatics Association123 [Author Response]456 We have added a more descriptive title to this table (which is now table 5) and the narrative that refers7 to it. The percents listed in the text were the average of the percents across those three institutions (e.g.8 for Maxent’s ranking of the correct LOINC code #1 across these institutions, the average is Co910 (.786+.735+.846)/3 x 100% = 78.9%.1112 I also do not understand the difference in the upper half and the lower half of the Table. I also do not nf13 understand the numbers in the text below the line. I think this material needs to be better organized.14 You seem to be jumping around in your thoughts faster than I can track.15 id1617 [Author Response]18 en19 We have added a more descriptive title to this table (which is now table 5) and the narrative that refers20 to it. The first three rows pertain to the three methods performance in ranking the correct LOINC code21 first. The second three rows pertain to the three methods performance in ranking the correct LOINC22 t23 code in the top 5. ia2425 We have added additional subheadings throughout the manuscript to help make the organization more26 l: clear.2728 I sort of understand the corpus is the gold standard. I assume it is because all of these terms were Fo2930 matched manually, at least those in question. How the threshold set, and what is the significance on31 accuracy by slight movements?32 rR33 [Author Response]3435 The reviewer is correct - these existing mappings in the corpus served as the Gold Standard for our3637 analyses because they were already in production from the operational health information exchange (i.e ev38 the INPC). The mapping team at Regenstrief has an extraordinary amount of experience, but we know39 from prior analyses that even experienced experts make mistakes in mapping (see Lin et al “Correctness40 of Voluntary LOINC Mapping for Laboratory Tests in Three Large Institutions”). For the purpose of this ie4142 analysis, which focused on leveraging what already existed, we accepted this level of known error as43 acceptable for our purposes. We are not entirely sure what the reviewer means in the question about a w4445 threshold. If it is in reference to the Gold Standard mapping, this is basically the threshold for clinical46 acceptability of test equivalence as determined by a mapping team with nearly 20 years of experience. On47 We clarified this by adding an explicit reference in the Methods to a previous paper describing the4849 general approach to mapping in the INPC. In terms of how the performance of the models is affected by50 different kinds or sets of terms, this was the reason for doing 2 types of cross validation: the random51 80/20 subset method and the holdout of three test institutions. ly5253 I would like some discussion on your thoughts for the overall process. Is it your opinion that any5455 institution could use your corpus and your processes (applications) to map their local terms to LOINC?56 What would be the expected accuracy, and what would be the expectation of what is matched? What57 would be done with the rest? Manual?585960 http://mc.manuscriptcentral.com/jamia
  9. 9. Journal of the American Medical Informatics Association Page 8 of 28123 [Author Response]456 We have significantly revised the Discussion to present a more clear rationale and set of considerations7 for applying the results of this study. In short, if a large mapping corpus was available, a Maxent8 threshold score could be used to identify a big chunk of terms that would need little (if any) human Co910 review and produce a reasonably accurate ranked list of candidate LOINC terms for the remainder that11 would expedite the human review process.12 nf13 Reviewer: 21415 Lack of justification for the methods. This work is almost more on the side of engineering than science. id1617 The choice of the methods is essentially justified more by convenience (availability of an open source18 implementation). It would be nice to show that the methods selected have been successfully used in en19 similar contexts and perform as well as or better than other methods (to be named and contrasted2021 against).22 t23 [Author Response] ia2425 We agree that a better justification is needed. We added a few additional details in the methods section26 l: about some of these choices, and then added a much richer discussion of these considerations in a2728 separate subsection of the Discussion. Fo2930 Lack of reflection on the use of the proposed method in the strategy for mapping local terms to LOINC31 (compared to existing tools). It would be nice to give some advice to LOINC users and the developers of32 systems integrating LOINC into a local system (i.e., vendors) as to which mapping approaches and tools rR3334 are best adapted to which content, and how to best use them in combination.3536 [Author Response]37 ev38 We agree that additional discussion of the practical application of these results was necessary. We39 addressed this by adding a more explicit Rationale subsection, a Maxent vs Lucene comparison40 subsection, and a Considerations for practical application subsection. ie414243 Along the same lines, justification should be provided for the choice of the top and top 5 terms in the w44 evaluation. Does it correspond to a particular use case? (e.g., top mapping for completely automatic4546 mapping) On4748 [Author Response]4950 We agree that we should have explained our rationale here more clearly. Given the level of accuracy of51 the automated methods, our assumption is that most operational uses will still require human review ly5253 (except perhaps those ranked 1 that score above the Maxent threshold). A high quality, short list of54 candidate terms is much easier for domain experts to review than a term-by-term search. We added55 explanation of this choice in the methods section.5657 Lack of examples. Examples should be added throughout the manuscript to illustrate the methods.585960 http://mc.manuscriptcentral.com/jamia
  10. 10. Page 9 of 28 Journal of the American Medical Informatics Association123 [Author Response]456 As mentioned in the comments to Reviewer 1, we have made significant revisions to address your7 concerns for clarity and additional explanation. Specifically, we added descriptions of Lucene and8 Maxent and how the models were constructed. We also added an entire section illustrating the Co910 formation of these two models and their scores using a short example corpus. We appreciate this11 reviewer’s and the other reviewer’s call for examples and think that their inclusion significantly improves12 the readability of the paper. nf1314 The process of selecting part of the corpus for training and the rest for testing repeatedly (n times) is15 id16 called n-fold cross validation (n=20 here). Please explain why you are doing it.1718 [Author Response] en1920 We agree that we should have explained our rationale more clearly. We actually used a repeated21 random subsampling method and not n-fold cross validation because we did not ensure that every term22 t23 was included in the validation/test set. We have added further clarification on this in the methods and ia24 indicated that is complementary to the holdout by institution approach we used in another analysis.2526 l: Unclear if any normalization is applied to words. If not, is this a limitation?2728 [Author Response] Fo293031 The normalization for both the MaxEnt and Lucene approaches were the same, and were performed by32 the Apache Lucene v3.0.3 standard analyzer. We have clarified this in the Methods section by creating a rR33 subsection and have also added a description of the standard analyzer and an example of what it does.3435 The process for the RELMA Auto Mapper approach did not use the Lucene Standard Analyzer, but rather3637 followed the recommended procedures as described in the RELMA training materials. ev3839 The failure analysis should be part of the discussion, not results.40 ie41 [Author Response]4243 We agree and have included it there. w4445 Please organize the discussion in subsections.46 On4748 [Author Response]4950 We agree and have done so.51 ly52 This work could be contrasted against work about maping local terms to standard terminologies beyond53 LOINC. See foe example: Peters L, Kapusnik-Uner JE, Nguyen T, Bodenreider O. An approximate matching5455 method for clinical drug names. AMIA Annu Symp Proc. 2011;2011:1117-26. Epub 2011 Oct 22. PubMed56 PMID: 22195172; PubMed Central PMCID: PMC3243188.57585960 http://mc.manuscriptcentral.com/jamia
  11. 11. Journal of the American Medical Informatics Association Page 10 of 28123 [Author Response]456 We agree that this is a helpful paper to contrast against and have done so in the Discussion.78 Reviewer: 3 Co910 For one thing, we all need to understand the real cost of using a method such as you propose. That is,11 what does it cost to find and deal with the false positives and false negatives. As you observe, the cost of12 manual, and semi-manual, methods are high and well documented. Methodologically, we also need to nf1314 know if the mappings missed by the best method are also missed by the other two methods - likely, or -15 less likely - whether the other two methods correctly map terms incorrectly mapped by the best method. id1617 [Author Response]18 en1920 We agree that this is an important consideration and have added further detail about the performance21 of Lucene and Maxent at ranking the correct LOINC code missed by the other model. The discussion also22 addresses how the RELMA Lab Auto Mapper can find matches for some of the terms that are missed by t23 ia24 the corpus-based models by querying the LOINC terminology directly. We have also sketched out a25 suggested mapping process using a corpus based approach in the new subsection on considerations for26 l: practical application.2728 Fo29303132 rR3334353637 ev383940 ie414243 w444546 On4748495051 ly525354555657585960 http://mc.manuscriptcentral.com/jamia
  12. 12. Page 11 of 28 Journal of the American Medical Informatics Association1234 A corpus-based approach for automated LOINC mapping567 Mustafa Fidahussein MD, MS, Daniel J. Vreeman PT, DPT, MSc8 Co9 Regenstrief Institute, Inc, and Indiana University School of Medicine, Indianapolis, IN101112 Abstract nf131415 Objective: To determine whether the knowledge contained in a rich corpus of local terms mapped to LOINC could id1617 be leveraged to help map local terms from other institutions.18 en1920 Methods: We developed two models to test our hypothesis. The first based on supervised machine learning was2122 created using Apache’s OpenNLP Maxent and the second based on information retrieval was created using t23 ia24 Apache’s Lucene. The models were validated by a random sub-sampling method that was repeated 20 times and that2526 l: used 80/20 splits for training and testing respectively. We also evaluated the performance of these models on all2728 laboratory terms from three test institutions. Fo293031 Results: For the 20 iterations used for validation of our 80/20 splits Maxent and Lucene ranked the correct LOINC32 first between 70.5% and 71.4% and between 63.7% and 65.0% respectively. For all laboratory terms from the three rR333435 test institutions Maxent ranked the correct LOINC first between 73.5% and 84.6% (mean 78.9%), whereas Lucene’s3637 performance was between 66.5% and 76.6% (mean 71.9%). Using a cutoff score of 0.46 Maxent always ranked the ev3839 correct LOINC first for over 57% of local terms.40 ie41 Conclusion: This study showed that a rich corpus of local terms mapped to LOINC contains collective knowledge4243 that can help map terms from other institutions. Using freely available software tools, we developed a data-driven w4445 automated approach that operates on term descriptions from existing mappings in the corpus. Accurate and efficient46 On47 automated mapping methods can help accelerate adoption of vocabulary standards and promote widespread health4849 information exchange.5051 ly52 Keywords: automated mapping, LOINC, local laboratory tests, health information exchange, supervised machine5354 learning, information retrieval, Maxent, Lucene.555657 Background and Significance585960 http://mc.manuscriptcentral.com/jamia
  13. 13. Journal of the American Medical Informatics Association Page 12 of 28123 Health information technology has the potential to improve the quality and efficiency of care.[1] However, the45 clinical data needed to make care decisions are often unavailable to providers at the right time and place.[2] While67 our patients seek care across many settings and institutions[3], the purview of our clinical information systems are8 Co9 usually curbed at organizational boundaries. Even within a single institution, the laboratory, radiology, pharmacy1011 and clinical note writing systems may function like data “islands”. Efficiently moving and aggregating patient data12 nf13 creates an important foundation for many tools and processes with the capability of improving healthcare delivery.1415 The Health Information Technology for Economic & Clinical Health (HITECH) act considerably increases the id1617 prospect of widespread electronic health record systems (EHRs) with health information exchange capabilities.[4]18 en19 HITECH requires that providers and hospitals demonstrate EHR information exchange to be eligible for the2021 Medicare and Medicaid incentive payments.22 t23 ia24 A central barrier to efficient health information exchange is the unique local names and codes for the same clinical2526 test or measurement performed at different institutions. When integrating many data sources, the only practical way l:2728 to overcome this barrier is by mapping local terms to a vocabulary standard. Logical Observation Identifiers Names Fo2930 and Codes (LOINC®) is a universal code system for identifying laboratory and clinical observations.[5] When3132 LOINC is used together with messaging standards such as HL7, independent systems can create interfaces with rR3334 semantic interoperability for electronically reporting test results. LOINC has been adopted both in the United States3536 and internationally by many organizations, including large reference laboratories, healthcare organizations,37 ev38 insurance companies, regional health information networks and national standards.[6-8] Within the USA, one recent3940 and notable adoption of LOINC is as the standard for laboratory orders and results in the Standards and Certification ie4142 Criteria of the Centers for Medicare and Medicaid Services EHR “Meaningful Use” incentive program.[9]43 w44 Before care organizations can realize the benefit of using vocabulary standards like LOINC, they must first map4546 their local test codes to terms in the standard. Unfortunately, this process is complex. It requires considerable On4748 domain expertise and is very resource-intensive.[8,10-12] Reducing the effort required to accurately map local terms4950 to LOINC would accelerate interoperable health information exchange and will be especially helpful to resource-51 ly52 challenged institutions.535455 The Regenstrief LOINC mapping assistant (RELMA), a desktop program freely distributed with LOINC5657 (http://loinc.org), is widely used by domain experts to map their local terms to LOINCs on a term-by-term basis.[12-585960 http://mc.manuscriptcentral.com/jamia
  14. 14. Page 13 of 28 Journal of the American Medical Informatics Association123 15] It also contains a feature called the RELMA Auto Mapper that batch processes a set of local terms and identifies45 a ranked list of candidate LOINCs for each local test in the collection. While RELMA’s automated mapping feature67 has accurately mapped radiology report terms[16,17], laboratory terms present special challenges because of their8 Co9 characteristically short and ambiguous test names.[8,10,18]101112 Previous studies have described several methods and tools for mapping laboratory terms to LOINC. Lau et al used nf1314 parsing and logic rules in conjunction with synonyms, attribute relationships and mapping frequency data to map15 id16 local laboratory test names to LOINC.[19] This paper was a descriptive analysis that did not include an evaluation of1718 its accuracy. Zollo et al used extensional definitions of laboratory concepts generated from actual test result data to en1920 map between two laboratories using a common dictionary that was also linked to LOINC.[20] The automated2122 matching software that leveraged these extensional definitions correctly identified 75% of the possible matches. In t23 ia24 addition to establishing new mappings, extensional definitions have also been used for auditing and characterizing2526 the degree of interoperability of existing local laboratory terms to LOINC mappings.[11,21] Sun and Sun evaluated l:2728 the performance of an automated lexical mapping program on terms from three institutions to LOINC.[22] The Fo2930 overall best lexical mapping algorithm identified the correct LOINC between 63% and 75% of local terms. Kim et al3132 described an approach for augmenting local test names that modestly improved mapping results using RELMA for rR3334 term-by-term mapping.[18] Lastly, Khan et al developed an automated tool that used a master file of mapped local3536 terms from several sites within the Indian Health service.[15] The local terms at these sites shared a common37 ev38 heritage, but had diverged over time in their naming conventions. Compared with a gold standard mapping3940 established by a term-by-term search with RELMA, the automated method correctly mapped 81% of the test terms. ie4142 Over the last 18 years, Regenstrief has mapped local terms from many institutions to a common dictionary as part of43 w44 the process of creating and expanding the Indiana Network for Patient Care (INPC), a comprehensive regional4546 health information exchange.[23] Thus, the INPC dictionary now represents a rich corpus of local terms mapped to On4748 LOINC. Like Lau et al and Khan et al, we hypothesized that the knowledge contained in this corpus of mappings4950 could be leveraged to help map local terms from other institutions.51 ly5253 To test this corpus-based approach, we developed two models based on supervised machine learning and5455 information retrieval using open source tools. Our data-driven approach relies exclusively on a rich corpus of local5657 term descriptions and does not directly reference the LOINC terminology. In this study we present the process of585960 http://mc.manuscriptcentral.com/jamia
  15. 15. Journal of the American Medical Informatics Association Page 14 of 28123 creating and validating these models and testing their performance on a set of local laboratory terms from three45 institutions. We also compare the performance of these models to the recently improved Lab Auto Mapper feature67 within RELMA.8 Co910 Methods1112 nf13 Establishing the gold standard and normalizing the corpus1415 id16 We compiled a corpus of all local terms from 104 different institutional code-sets that were mapped to LOINC1718 through the INPC common dictionary between 1997 and 2012. Each local term from these sets had been mapped by en1920 domain experts at Regenstrief through manual review, assisted by the use of RELMA and other locally developed2122 tools. For all analyses, these existing LOINC mappings from the operational health information exchange served as t23 ia24 our gold standard. A description of how Regenstrief performs and maintains the mappings in the INPC have been2526 published previously.[12] We did not perform additional auditing of the mappings as part of this analysis. l:2728 For each local term in the corpus, the set of words constituting its description (e.g. the laboratory test name) was Fo2930 normalized using Apache Lucene’s v3.0.3 Standard Analyzer.[24,25] The Lucene Standard Analyzer uses lexical3132 rules to recognize alphanumeric characters, convert strings to lowercase, and remove stop words. For example, the rR3334 local term descriptions “CSF CELL COUNT/DIFF” and “GLU (TOL) UR-5 HR” would be normalized to “cell3536 count csf diff” and “glu hr tol ur-5” respectively.37 ev3839 Creating a model based on supervised machine learning – Maxent40 ie4142 We used Apache’s OpenNLP Maxent v3.0.1.[26] to create a maximum entropy based statistical algorithm for43 w44 supervised machine learning. The principle of maximum entropy provides a probability distribution that is as4546 uniform as possible by assuming nothing about what is unknown.[27] The probability distribution derived from On4748 human specified constraints in training data is then used to predict the probability of a random set of constraints in4950 test data.51 ly5253 To create a Maxent model each local term in the training set was considered as a separate event with its normalized5455 description used as predicates and the mapped LOINC used as outcome. When local terms from the test set were5657 applied against this model, Maxent calculated a probability score between zero and one for each LOINC (outcome)585960 http://mc.manuscriptcentral.com/jamia
  16. 16. Page 15 of 28 Journal of the American Medical Informatics Association123 contained in the corpus. The LOINCs with the highest score (Top 1) and those with the highest five scores (Top 5)45 were noted for each local term.678 Creating a model based on information retrieval – Lucene Co91011 We used Apache’s Lucene v3.0.3.[24] to create an information retrieval based model. Lucene is a popular12 nf13 information retrieval library that creates documents with indexed fields for fast searching. Its scoring formula1415 matches the similarity between indexed fields and search terms for each document.[25] To create a Lucene model id1617 we created separate documents for every unique LOINC in the training set. Each document then contained the18 en19 normalized description from all local terms mapped to that LOINC as its indexed field. When local terms from the2021 test set were queried against this model, Lucene calculated a score for each LOINC (document) contained in the22 t23 corpus. This score was based on the number of times queried words co-occurred with that document and the total ia2425 number of documents associated with those words. The Lucene score ranged from zero with no upper bound value.26 l:27 An example of the models created by Maxent and Lucene28 Fo2930 To illustrate use of the Maxent and Lucene models, consider a corpus that contains only five terms from two3132 different institutions with manually mapped LOINCs as shown in Table 1. The data from this corpus is used to rR3334 create a Maxent model with five events and three outcomes as shown in Table 2. It is also used to create a Lucene3536 model with three documents and corresponding indexed fields as shown in Table 3. Note that the Lucene model37 ev38 concatenates all the term descriptions from different institutions mapped to the same LOINC. Now suppose that a3940 test institution contains five unmapped terms, each only containing the words “indirect”, “direct”, “coombs”, ie4142 “bilirubin” and “direct test” in their term descriptions. When these term descriptions are applied against Maxent and43 w44 Lucene, each model returns a set of three scores that represents the likelihood of that test term being mapped to each4546 of the three LOINCs contained in the corpus. On474849 Table 1: Hypothetical corpus containing five local terms from two different institutions.5051 ly Institution Local Code Term Description Mapped LOINC5253 1 12802 Indirect AGT 1003-35455 1 18231 Direct Coombs IgG Ab 1006-65657585960 http://mc.manuscriptcentral.com/jamia
  17. 17. Journal of the American Medical Informatics Association Page 16 of 281234 2 DCTG Direct Coombs Test 1006-656 2 IAT Indirect Coombs 1003-378 2 BILID Bilirubin, Direct 1968-7 Co91011 Table 2: Representation of the Maxent model based on the corpus shown in Table 1.12 nf1314 Event # Predicates (normalized term descriptions) Outcome15 id16 1 agt indirect 1003-31718 2 coombs indirect 1003-3 en19 3 ab coombs direct igg 1006-62021 4 coombs direct test 1006-622 t23 5 bilirubin direct 1968-7 ia242526 l:27 Table 3: Representation of the Lucene model based on the corpus shown in Table 1.28 Fo29 Document ID Indexed Field3031 1003-3 agt indirect coombs indirect32 rR33 1006-6 ab coombs direct igg coombs direct test3435 1968-7 bilirubin direct3637 ev38 Table 4: Maxent and Lucene scores for each LOINC from the corpus in Table 1 when local term descriptions are3940 queried against both models. ie414243 Maxent Model Scores Lucene Model Scores w4445 Term Description / LOINC # 1003-3 1006-6 1967-7 1003-3 1006-6 1967-746 On47 “indirect” 0.9134 0.0432 0.0432 1.9876 0.0000 0.00004849 “direct” 0.1352 0.4558 0.4090 0.0000 1.4142 1.00005051 “coombs” 0.5019 0.3704 0.1277 1.0000 1.4142 0.0000 ly5253 “bilirubin” 0.0241 0.0241 0.9517 0.0000 0.0000 1.40545455 “direct, test” 0.0136 0.9451 0.0412 0.0000 1.9651 0.28995657585960 http://mc.manuscriptcentral.com/jamia
  18. 18. Page 17 of 28 Journal of the American Medical Informatics Association123 Evaluation approach456 To characterize how well these models performed in mapping local terms to LOINC, we conducted three sets of78 analyses that are described in detail in the following sections. In each case, the top five scoring LOINCs were Co910 compared with the LOINC assigned by manual mapping (our gold standard). We chose to limit the list of LOINC1112 codes returned by the analyses to the top five based on our practical experience with mapping and preliminary nf1314 analyses that showed it was rare for the correct LOINC code to appear in the next few rankings. Domain experts can15 id16 quickly review a short list of ranked candidate LOINC terms to determine which, if any, of the LOINC terms was1718 the correct match. A longer list is more cumbersome to review, and our experience has been that mappers prefer an en1920 interactive search interface like RELMA to reviewing a long list of candidate codes.2122 t23 Validating the models using 80/20 splits ia242526 We validated the predictive performance of both models by using a random sub-sample method (80% for training l:27 and 20% for testing) that was repeated 20 times. For each of the iterations, 80% of local terms from our normalized28 Fo29 corpus were randomly selected as training set to create Maxent and Lucene models as described above. Normalized3031 local term descriptions from the remaining 20% that served as test set were then queried against both models. We32 rR33 chose this approach to cross-validation to help prevent the models from being over fitted. Splitting the corpus at the3435 term level (rather than at the level of a whole set of terms from an institution) demonstrates the prediction of the3637 ev models for a heterogeneous set of terms with varying naming conventions. The top five scoring LOINCs resulting3839 from each model were compared with the LOINC assigned by manual mapping (our gold standard).40 ie4142 Evaluating the models’ performance using test terms from three institutions and comparison to Lab Auto Mapper.43 w4445 We determined the performance of our models in mapping an entire set of local laboratory terms from three test46 On47 institutions (community hospitals located in central Indiana). In this case, training sets comprising of all terms in our4849 corpus minus those belonging to the three test institutions were used to create Maxent and Lucene models as5051 described above. Normalized local terms descriptions from the corresponding test sets containing all laboratory ly5253 terms from the three institutions were then applied against both models. This approach simulates the typical5455 mapping scenario of integrating all the terms from a new institution’s laboratory system. Each institution code set5657 covers the set of tests performed by typical community hospital laboratory, and reflects the idiosyncratic naming585960 http://mc.manuscriptcentral.com/jamia
  19. 19. Journal of the American Medical Informatics Association Page 18 of 28123 conventions established by that institution. The top five scoring LOINCs resulting from each model were compared45 with the LOINC assigned by manual mapping (our gold standard).678 We also compared the performance of the models with RELMA’s Lab Auto Mapper for the test set of terms from Co910 these three institutions. For this analysis we used the most recent publicly available version, RELMA v5.6.[28] The1112 Lab Auto Mapper uses a series of algorithms optimized for laboratory terms to generate a list of candidate LOINCs. nf1314 In addition to using words contained in a local term’s description it can also leverage information from battery15 id16 terms, units of measures, common tests and the synonymy contained in LOINC. Its score is based on the number1718 and proportion of words that match between the local term and the fully specified LOINC name.[29] We followed en1920 the recommended procedures for loading local terms into RELMA and running the Lab Auto Mapper as described in2122 the RELMA Users’ Manual and LOINC and RELMA tutorial produced by Regenstrief Institute. [29, 30] t23 ia2425 Lastly, we investigated whether a threshold Maxent score could serve as a useful cutoff for always identifying the26 l:27 correct LOINC code. We first plotted the rank of the correct LOINC among the top five against its Maxent score for28 each term in the test set, and then evaluated the Maxent score above which the correct LOINC was always ranked Fo293031 first.32 rR33 Evaluating the models’ performance on the corpus as it has grown over time343536 To determine our models’ performance against a growing corpus we again used the test set of all local laboratory37 ev38 terms from the three institutions as above. However, this time 12 training sets were used, each containing local terms3940 from the corpus (minus those in the test set) in chronological order and in increments of 6,400. Thus, the first ie4142 training set contained the first 6,400 local terms created in the corpus; the second training set contained the first43 w44 12,800 local terms; and the twelfth and last training set contained all the local terms. Normalized local term4546 descriptions from the test set were applied against Maxent and Lucene models created from each of the 12 training On4748 sets. The top five scoring LOINCs resulting from each model were compared with the LOINC assigned by manual4950 mapping (our gold standard).51 ly5253 Results545556 Our corpus from 104 institutional code sets contained 81,691 local terms, each associated with a description and5758 mapped to a LOINC. These local terms were mapped to 7,565 unique LOINCs and contained 11,620 unique words5960 http://mc.manuscriptcentral.com/jamia
  20. 20. Page 19 of 28 Journal of the American Medical Informatics Association123 in their descriptions (test names). This corpus was built from 1997 to 2012 as a byproduct of the INPC expansion.45 New local terms were added to the INPC master dictionary and mapped to LOINC both because new institutions67 began to participate in the health information exchange and because participating institutions created new local8 Co9 terms. Figure 1 shows the growth in number of unique LOINCs and number of unique words associated with all1011 local terms as the corpus has expanded with new local terms over time.12 nf1314 Results of validating the models using 80/20 splits.15 id1617 In each of the 20 iterations of random sub-sampling from our corpus into 80% for training and 20% for testing, there18 en19 were 65,361 local terms in the training set and 16,330 local terms in the test set. The number of unique LOINCs to2021 which these local terms were mapped varied between 7,115 and 7,190 for the training set and between 4,391 and22 t23 4,493 for the test set. ia242526 Maxent ranked the correct (manually mapped) LOINC first for 11,513 to 11,661 (70.5%-71.4%, mean 71.0%) of l:27 local terms in the test sets and ranked the correct LOINC among the top five for 13,871 to 14,073 (84.9%-86.2%,28 Fo29 mean 85.5%). Lucene ranked the correct LOINC first for 10,407 to 10,610 (63.7%-65.0%, mean 64.3%) of local3031 terms in the test sets and ranked the correct LOINC among the top five for 13,649 to 13,841 (83.6%-84.8%, mean32 rR33 84.2%). These results for each of the 20 iterations are shown in Figure 2.343536 Results of Maxent, Lucene and Lab Auto Mapper using laboratory terms from three test institutions.37 ev3839 The three institutions chosen as test sets contained 1,099, 1,705 and 838 local laboratory terms that were mapped to40 ie41 573, 757 and 328 unique LOINCs respectively. The results of applying these test sets against the Maxent model, the4243 Lucene model, and the Lab Auto Mapper are shown in Table 5. Averaging the performance across these three w4445 institutions, the correct LOINC was ranked first for 78.9%, 71.9%, and 50.3% and ranked among the top five for46 On47 91.4%, 90.0%, and 68.6% of local terms when applied against Maxent, Lucene and Lab Auto Mapper respectively.484950 Table 5: Percentage of local laboratory terms from each test institution that when applied against Maxent, Lucene51 ly52 and Lab Auto Mapper had the correct LOINC ranked highest (Top 1) and among the highest five (Top 5).5354 Institution 1 Institution 2 Institution 355 n=1,099 n=1,705 n=83856 Maxent Top 1 78.6% (864) 73.5% (1,253) 84.6% (709)57585960 http://mc.manuscriptcentral.com/jamia
  21. 21. Journal of the American Medical Informatics Association Page 20 of 281234 Lucene Top 1 72.6% (798) 66.5% (1,133) 76.6% (642)56 Lab Auto Mapper Top 1 49.6% (545) 46.8% (798) 54.5% (457)7 Maxent Top 5 90.5% (995) 88.8% (1,514) 94.7% (794)8 Co9 Lucene Top 5 89.8% (987) 86.0% (1,466) 94.3% (790)1011 Lab Auto Mapper Top 5 71.8% (789) 66.9% (1,140) 67.1% (562)12 nf131415 For the 3,642 local terms in the three test sets, ranks of the correct LOINCs among the top five were plotted against id1617 their Maxent scores. As illustrated in Figure 3, this plot shows that when the score was above 0.46 the correct18 en19 LOINC was always ranked first by the model. Using this cutoff score to separate a high certainty top rank, Maxent2021 ranked the correct LOINC first for 2,099 (57.6%) of the local terms.22 t23 ia24 Results of the models’ performance on the corpus as it has grown over time2526 l:27 Figure 4 illustrates the performance of both models on the test set containing all local terms from three institutions28 using a series of training sets that represent growth in the corpus over time. The training sets in this analysis Fo293031 organized the local terms in the corpus in chronologic order by increments of 6,400 terms. The results show a32 gradual leveling off in Maxent’s performance and a slight decrease in Lucene’s performance as the number of terms rR3334 in the corpus reached its maximum.353637 ev Discussion383940 Our study shows that a rich corpus of local terms mapped to LOINC can help map terms from other institutions. ie4142 Overall, the supervised machine learning based Maxent model ranked the correct LOINC first for 79% and the43 w44 information retrieval based Lucene model for 72% of local laboratory terms from our three test institutions. These4546 results are similar in accuracy to the best reported automated techniques from prior studies of laboratory test On4748 mapping. Our approach has the advantages of using freely available tools and only requiring local term descriptions4950 as the data substrate.51 ly5253 Rationale for using Maxent and Lucene models545556 Given a rich corpus of existing mappings established by domain experts, we wanted to explore the validity and5758 performance of a purely data-driven approach to automated LOINC mapping. We used Apache’s Maxent to create a5960 http://mc.manuscriptcentral.com/jamia
  22. 22. Page 21 of 28 Journal of the American Medical Informatics Association123 supervised machine learning model and Apache’s Lucene to create an information retrieval model, as these tools are45 freely available, offer good performance on typical personal computer hardware, and are relatively easy to deploy.678 The usual application of Maxent models involves a binary outcome, such as natural language processing tasks like Co910 sentence detection and part of speech tagging. In this study, we created a Maxent model with thousands of outcomes1112 represented by all unique LOINCs contained in the training corpus. We are not aware of prior studies that used nf1314 Maxent in this manner or in the context of automated mapping.15 id1617 Lucene is used widely in a variety of applications for document indexing and search engine functions.[31] Since18 en19 version 5.0 (released December 2010), the search functionality in RELMA has implemented Lucene, including the2021 Lab Auto Mapper. Prior studies have demonstrated that RELMA is a very capable tool for mapping local terms to22 t23 LOINC.[15-18,32,33] Our application of Lucene differs from RELMA in that we did not directly query the LOINC ia2425 terminology at all. Whereas RELMA queries against the stylized LOINC names and synonyms included in LOINC,26 l:27 both the Lucene and Maxent models in our approach only queried against words from local term descriptions28 mapped to LOINC codes. We had hypothesized that the idiosyncratic variation present in a large corpus of local Fo293031 term descriptions might help overcome the challenge of relying on the synonymy in LOINC. Although the32 synonymy in LOINC is quite good for common abbreviations, the standards development process cannot possibly rR333435 keep up with all the permutations of abbreviations seen in local tests names. For example, just a few of the variants36 for “Neisseria gonorrhoeae” present in our corpus include: “N.GONORRHOEA”, “N. GONORRHEAE”,37 ev38 “N.GONO”, “Gono”, “N. GONORR.”, “NEISS GONORR”, and “NEISSERIA GONORR”.3940 ie41 Our approach with the Maxent and Lucene models is relatively simple compared with the processing algorithm of4243 the RELMA Lab Auto Mapper or the drug-centric token matching approach employed by Peters et al in mapping w4445 drug name variants to RxNorm.[34] The models in our approach were naïve to the semantics of tokens in the test46 On47 descriptions. The Lab Auto Mapper has functions that try to identify the specimen (e.g. CSF or Serum) and uses the4849 units of measure associated with the test to limit candidate LOINCs to those with a Property attribute consistent with5051 those units. For example, based on an internal mapping table, the Lab Auto Mapper would only return LOINC codes ly5253 with a Property of Mass Concentration if the local test had associated units of ug/dL. Similarly, the drug-centric5455 token matching approach used by Peters et al [34] attempts to identify and perform special processing on the drug5657 name in a local string that is not performed on the tokens that may represent other components of the name like585960 http://mc.manuscriptcentral.com/jamia
  23. 23. Journal of the American Medical Informatics Association Page 22 of 28123 strength or dose form. An advantage of our data-driven approach is that it did not require any domain specific45 tailoring.678 Comparing the performance of Maxent versus Lucene Co91011 Maxent performed better than Lucene in ranking the correct LOINC first due to Maxent’s tendency to over fit the12 nf13 model. Maxent thus computes high scores for local terms with words that matched very closely with those in1415 training sets. However, both models ranked the correct LOINC among the top five for more than 90% of local terms id1617 from the three test institutions.18 en1920 Over the 20 iterations of random sub-sampling using 80/20 splits, Maxent on average identified the correct LOINC2122 for 2.9% (473) of local test terms that Lucene failed to score among the top five. Conversely, Lucene on average t23 ia24 identified 1.5% (251) of local test terms that Maxent failed to score among the top five. For our analyses on test sets2526 from three institutions, Maxent identified the correct LOINC for 2.8% (101) of local terms that Lucene failed to l:27 score among the top five whereas Lucene identified 2.4% (89) of local terms that Maxent failed to score among the28 Fo29 top five. The relatively small number of terms ranked correctly by one model but not the other illustrates that they3031 perform well on similar kinds of test descriptions.32 rR3334 One important advantage of Maxent over Lucene and Lab Auto Mapper is its normalized score. We used this3536 normalized score to determine a helpful threshold above which only the correct LOINC was ranked first. Using this37 ev38 cutoff score, we found that over 57% of local terms in our three test institutions could be ranked with a high degree3940 of certainty. Such a cutoff score is valuable in separating local terms that can be mapped with little (or no) human ie4142 review from those that need more extensive review.43 w4445 Corpus growth and variability in mapping results across institutions46 On4748 We probed the robustness of our corpus-based approach by analyzing several different test sets and evaluating4950 performance as the corpus grew over time. These aspects are potentially relevant in deciding whether a corpus has51 ly52 reached critical mass to be used effectively for modeling. We observed slightly more variation in accuracy when5354 considering entire term sets from each of our three test institutions than in our random 80/20 splits of the corpus.5556 This suggests that institutions’ particular naming patterns can alter the mapping success even when the corpus is5758 large. As local term mappings were added to our corpus, the growth rate in unique LOINCs decreased more than the5960 http://mc.manuscriptcentral.com/jamia
  24. 24. Page 23 of 28 Journal of the American Medical Informatics Association123 growth rate in unique words in term descriptions. This is a favorable pattern as it indicates a growth in diversity of45 words associated with LOINCs already present in the corpus. Our results showed that Maxent’s performance was67 not affected by the incremental growth in our corpus over time, but there was a slight decrease in Lucene’s8 Co9 performance.101112 Limitations of a data-driven paradigm and potential future research nf131415 The primary drawback of our approach is that its success is limited by the relative completeness of the underlying id1617 training corpus. Of the 3,642 local terms in our three test institutions, 46 were mapped to LOINCs with no training18 en19 data, 10 had words not associated with any LOINC and 69 had words not associated with the correct LOINC in the2021 training set. While neither Maxent nor Lucene was capable of ranking the correct LOINC for these 125 (3.4%) local22 t23 terms due to limitations in the corpus or because their term descriptions were completely novel, Lab Auto Mapper ia2425 ranked the correct LOINC first for 35 (28%) and among the top five for 45 (36%) of these local terms.26 l:27 RELMA’s Lab Auto Mapper succeeded where our models failed by directly querying the LOINC terminology. It28 Fo29 also uses additional information such as the units of measure associated with a local term in its algorithm, and others3031 [5a, 9] have illustrated how extended profiles built from actual test results can be useful in mapping. Our corpus-32 rR33 based approach solely depends on matching words in term descriptions, and thus a global test name enhancement3435 process such as that described by Kim et al [10] may be beneficial. In contrast to the name enhancement process, a3637 ev major benefit of our approach is that it requires little domain expertise on the front end. Evaluating the combined3839 strengths of these different approaches; exploring the value in adding other axes such as units of measure to the data40 ie41 models; testing alternate algorithms for supervised machine learning; and using information retrieval models like4243 “fuzzy search” would be valuable future research. w444546 Our study has some other important limitations. We used a single corpus of mapped local terms from institutions in On4748 a broad but geographically based area. Naming conventions used in other institutions may differ from our corpus in4950 important ways that lower the accuracy of mapping with Maxent and Lucene. For instance, we have seen some51 ly52 institutions that use semantically meaningless descriptions such as “1001” in lieu of something that resembles a test5354 name. Clearly, an automated mapping approach like ours would fail to map such local terms. Moreover, significant5556 differences in naming conventions may compromise the ability to normalize term descriptions from training and test57585960 http://mc.manuscriptcentral.com/jamia
  25. 25. Journal of the American Medical Informatics Association Page 24 of 28123 data uniformly. Additionally, since our corpus and test sets contained predominantly laboratory terms, we do not45 know how well data-driven models would generalize to other important clinical measurement variables.678 Considerations for practical application Co91011 Expert review is a high cost resource in mapping. By identifying a short, accurate, ranked list of candidate LOINC12 nf13 codes for each local term we can optimize the process of human review. In settings where a large corpus of existing1415 mappings is available, the Maxent model performed the best of those we evaluated and would be our id1617 recommendation for producing this ranked list. By choosing a high Maxent cutoff score (e.g. above 0.46), more than18 en19 half of the local terms could likely be mapped with little or no human review. If human review of the ranked list2021 reveals that a matching LOINC code is not present, the reviewer can default back to the typical term-specific search22 t23 using interactive functions of RELMA. ia242526 While the core software tools we used in this study (Maxent and Lucene) are available at no cost under open source l:27 licenses, the corpus of local test descriptions mapped to LOINC from the INPC is not currently available publicly.28 Fo29 Encouraged by the results of this study, the Regenstrief LOINC team recently announced a project to build a shared3031 repository of local tests mapped to LOINC.[35] Because it is open to contributions from the global LOINC32 rR33 community, this new repository has the potential to serve as an important data substrate for future analyses.343536 Conclusion37 ev3839 Our study shows that a rich corpus of local terms mapped to LOINC contains collective knowledge that can help40 ie41 map terms from different institutions. We developed an automated mapping approach based on supervised machine4243 learning and information retrieval using Apache’s Maxent and Lucene that are available at no cost. Our approach w4445 operates on term descriptions from existing mappings in the corpus. Overall, Maxent ranked the correct LOINC first46 On47 for 79% and Lucene for 72% of local terms from our three test institutions. Using a cutoff score of 0.46 would allow4849 Maxent to identify over 57% of local terms that always had the correct LOINC ranked first. Mapping local terms to5051 a vocabulary standard is a necessary, but resource-intensive part of integrating data from disparate systems. ly5253 Accurate and efficient automated mapping methods can help accelerate adoption of vocabulary standards and5455 promote widespread health information exchange.565758 Contributors5960 http://mc.manuscriptcentral.com/jamia
  26. 26. Page 25 of 28 Journal of the American Medical Informatics Association123 MF and DV conceived and designed the study, collected the data, evaluated the results, and wrote and edited the45 manuscript. MF created the models and performed the analyses.678 Funding Co91011 This work was supported in part by grant 5T 15 LM007117-14 and contract HHSN2762008000006C from the12 nf13 National Library of Medicine and performed at the Regenstrief Institute, Indianapolis, IN.1415 id16 Competing Interests1718 en None192021 Figures22 t23 ia Figure 1: Growth in unique LOINCs mapped to local terms and unique words in local term descriptions as the2425 number of local terms in the corpus has expanded over time.26 l:27 Figure 2: Results of 20 iterations of repeated random sub-sampling validation showing the percentage of test28 Fo29 terms with manually mapped LOINCs ranked first (top one) and among the top five by Maxent and Lucene.3031 Figure 3: Rank of correct LOINCs and their Maxent score for local laboratory terms from three test institutions.32 rR33 Figure 4: Performance of Maxent and Lucene when applying the test set against a growing corpus of local terms.34353637 ev References383940 1. Chaudhry B,Wang J,Wu S, Maglione M, Mojica W, Roth E, et al. Systematic review: impact of health ie4142 information technology on quality, efficiency, and costs of medical care. Ann Intern Med. 2006;144.43 w44 2. Smith PC, Araya-Guerra R, Bublitz C, et al. Missing clinical information during primary care visits. Jama. Feb4546 2 2005;293(5):565-571. On4748 3. Finnell JT, Overhage JM, Grannis S. All health care is not local: an evaluation of the distribution of emergency4950 department care delivered in Indiana. AMIA Annu Symp Proc. 2011;2011:409-16. Epub 2011 Oct 22.51 ly52 4. 111th Congress of the United States of America. American Recovery and Reinvestment Act of 2009.5354 5. McDonald CJ, Huff SM, Suico JG, et al. LOINC, a universal standard for identifying laboratory observations: a5556 5-year update. Clin Chem. 2003 Apr;49(4):624-33.57585960 http://mc.manuscriptcentral.com/jamia
  27. 27. Journal of the American Medical Informatics Association Page 26 of 28123 6. International LOINC downloads, linguistic variants in RELMA and translating LOINC. Available at45 http://loinc.org/international/. Accessed Apr 30 2012.67 7. Vreeman DJ, Chiaravalloti MT, Hook J, McDonald CJ. Enabling international adoption of LOINC through8 Co9 translation. J Biomed Inform. 2012 Jan 21. [Epub ahead of print]1011 8. Baorto DM, Cimino JJ, Parvin CA, Kahn MG. Combining laboratory data sets from multiple institutions using12 nf13 the logical observation identifier names and codes (LOINC). Int J Med Inform. 1998 Jul;51(1):29-37.1415 9. Department of Health and Human Services. 45 CFR Part 170. Health information technology: Initial set of id1617 standards, implementation specifications, and certification criteria for electronic health record technology; Final18 en19 Rule; published July 28, 2010.2021 10. Lin MC, Vreeman DJ, McDonald CJ, Huff SM. Correctness of voluntary LOINC mapping for laboratory tests22 t23 ia in three large institutions. AMIA Annu Symp Proc. 2010 Nov 13;2010:447-51.2425 11. Lin MC, Vreeman DJ, Huff SM. Investigating the semantic interoperability of laboratory data exchanged using26 l:27 LOINC codes in three large institutions. AMIA Annu Symp Proc. 2011;2011:805-14. Epub 2011 Oct 22.28 Fo29 12. Vreeman DJ, Stark M, Tomashefski GL, Phillips DR, Dexter PR. Embracing change in a health information3031 exchange. AMIA Annu Symp Proc. 2008 Nov 6:768-72.32 rR33 13. Li W, Tokars JI, Lipskiy N, Ganesan S. An efficient approach to map LOINC concepts to notifiable conditions.3435 Advances in Disease Surveillance. 2007;4:172.3637 ev38 14. Dugas M, Thun S, Frankewitsch T, Heitmann KU. LOINC codes for hospital information systems documents: a39 case study. J Am Med Inform Assoc. 2009 May-Jun;16(3):400-3. Epub 2009 Mar 4.40 ie4142 15. Khan AN, Griffith SP, Moore C, et al. Standardizing laboratory data by mapping to LOINC. J Am Med Inform43 Assoc. 2006 May-Jun;13(3):353-5. Epub 2006 Feb 24. w444546 16. Vreeman DJ, McDonald CJ. Automated mapping of local radiology terms to LOINC. AMIA Annu Symp Proc. On47 2005:769-73.484950 17. Vreeman DJ, McDonald CJ. A comparison of Intelligent Mapper and document similarity scores for mapping51 ly local radiology terms to LOINC. AMIA Annu Symp Proc. 2006:809-1.525354 18. Kim H, El-Kareh R, Goel A, Vineet F, Chapman WW. An approach to improve LOINC mapping through5556 augmentation of local test names. J Biomed Inform. 2011 Dec 21. [Epub ahead of print]57585960 http://mc.manuscriptcentral.com/jamia
  28. 28. Page 27 of 28 Journal of the American Medical Informatics Association123 19. Lau LM, Johnson K, Monson K, Lam SH, Huff SM. A method for the automated mapping of laboratory results45 to LOINC. Proc AMIA Symp. 2000:472-6.67 20. Zollo KA, Huff SM. Automated mapping of observation codes using extensional definitions. J Am Med Inform8 Co9 Assoc. 2000 Nov-Dec;7(6):586-92.1011 21. Lin MC, Vreeman DJ, McDonald CJ, Huff SM. Auditing consistency and usefulness of LOINC use among12 nf13 three large institutions - Using version spaces for grouping LOINC codes. J Biomed Inform. 2012 Jan 28. [Epub1415 ahead of print] id1617 22. Sun JY, Sun Y. A system for automated lexical mapping. J Am Med Inform Assoc. 2006 May-Jun;13(3):334-18 en19 43. Epub 2006 Feb 24.2021 23. McDonald CJ, Overhage JM, Barnes M, et al. The Indiana network for patient care: a working local health22 t23 ia information infrastructure. Health Aff (Millwood). 2005 Sep-Oct;24(5):1214-20.2425 24. Apache Lucene. Available at http://lucene.apache.org/. Accessed Apr 30 2012.26 l:27 25. McCandless M, Hatcher E, Gospodnetic O. Lucene in action. Stamford: Manning Publications, 2010.28 Fo2930 26. Apache OpenNLP. Available at http://opennlp.apache.org/. Accessed Apr 30 2012.3132 27. Berger LA, Della Pietra VJ, Della Pietra SA. A maximum entropy approach to natural language processing. rR3334 Computational Linguistics. 1996 Mar;22(1):39-71.3536 28. Regenstrief LOINC Mapping Assistant, version 5.6. Available at http://loinc.org/. Accessed Apr 30 2012.37 ev38 29. RELMA version 5.6 Users’ Manual. Available at http://loinc.org/downloads/relma. Accessed Apr 30 2012.3940 30. Case JT. Using RELMA. Or…in search of the missing LOINC. Available at http://loinc.org/slideshows/lab- ie4142 loinc-tutorial. Accessed Apr 30 2012.43 w44 31. PoweredBy – Lucene-java Wiki. Available at http://wiki.apache.org/lucene-java/PoweredBy. Accessed Sep 26,4546 2012. On4748 32. Abhyankar S, Demner-Fushman D, McDonald CJ. Standardizing clinical laboratory data for secondary use. J4950 Biomed Inform. 2012 Aug;45(4):642-50. Epub 2012 May 3.51 ly52 33. Zunner C, Bürkle T, Prokosch HU, Ganslandt T. Mapping local laboratory interface terms to LOINC at a5354 German university hospital using RELMA V.5: a semi-automated approach. J Am Med Inform Assoc. 2012 Jul5556 16. [Epub ahead of print]57585960 http://mc.manuscriptcentral.com/jamia
  29. 29. Journal of the American Medical Informatics Association Page 28 of 28123 34. Peters L, Kapusnik-Uner JE, Nguyen T, Bodenreider O. An approximate matching method for clinical drug45 names. AMIA Annu Symp Proc. 2011;2011:1117-26. Epub 2011 Oct 22.67 35. Regenstrief launches Community Mapping Repository and Asks for Contributions of Existing Mappings to8 Co9 LOINC. Available at http://loinc.org/resolveuid/5fff22576bc94db371020ed12dbc5c34. Accessed Sep 26, 2012.101112 nf131415 id161718 en19202122 t23 ia242526 l:2728 Fo29303132 rR3334353637 ev383940 ie414243 w444546 On4748495051 ly525354555657585960 http://mc.manuscriptcentral.com/jamia

×