Deriving an ICU Subset of SNOMED CT from Clinical Notes


Published on

Professor Jon Patrick
Health Information Technology Research Laboratory (HITRL -
School of Information Technologies
University of Sydney
(P38, 16/10/08, Coding stream, 3.30pm)

  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Information Extraction from Clinical Notes Abstract We are in the testing phase of a project at the Royal Prince Alfred Hospital that does information extraction from clinical notes in the Intensive Care Unit. The language processing is part of a system to support clinicians complete their ward rounds more efficiently and ease the burden of administration in record keeping.. In the first stage the NLP demonstrates the automatic computation of SNOMED CT codes as clinicians write their progress notes. The system computes a tailored extract of the patient's clinical record from the ICU's information system, CareVue, relevant to the needs of reviewing the patient's case. The extract is presented to the clinician on a screen who then types in the relevant progress notes they wish to make. The system computes the SNOMED CT codes in real-time after analysing the progress notes and then they are stored back into CareVue.The system will be of significant advantage to the clinician in their ward rounds. The automatic extraction of relevant content will give considerable time savings in not having to manually search the clinical information system, considered to be a saving of up to 10 minutes per patient (up to 50 patients in the ward visited twice per day). Post data entry the conversion of clinical records into a coded system will ensure more efficient and more reliable data analytics. The work is expected to progress in two directions, namely to improve the accuracy of the information extraction process and to develop a restricted data analytics natural language grounded in the SNOMED CT coding scheme.
  • Deriving an ICU Subset of SNOMED CT from Clinical Notes

    1. 1. Deriving an ICU Subset of SNOMED CT from Clinical Notes Professor Jon Patrick Health Information Technology Research Laboratory (HITRL - School of Information Technologies University of Sydney
    2. 2. Our Strategic Objectives <ul><li>Deliver NLP Enhancement Technologies for intelligent support and processing. </li></ul><ul><li>Build generic, compact, customisable Clinical ISs to enhance existing clinical processes. </li></ul><ul><li>Position Natural Language Processing as the base technology for processing the EMR </li></ul><ul><li>“ all clinical measurement eventually becomes language” </li></ul>
    3. 3. Our HIT Activities <ul><li>60+ Projects in 3 Years </li></ul><ul><li>Language Analysis of clinical texts </li></ul><ul><li>Data Analysis of the EHR </li></ul><ul><li>Converting clinical narratives to SNOMED CT </li></ul><ul><li>Mapping other coding systems to SNOMED CT </li></ul><ul><li>Generating ICU & ED subsets of SNOMED CT </li></ul><ul><li>Building Clinical Information Systems - WeBCIS </li></ul><ul><li>Generating Clinical Information Systems - GCIMS </li></ul><ul><li>Rescuing data from abandoned and decaying ISs - OMNI-LAB, HOSLAB, CARDS, BS, HOSREP. </li></ul><ul><li>Partners: RPAH, NCCH, SWAHS, Children’s Westmead, Path Labs, and more </li></ul>
    4. 4. Enhancement Technologies Active Projects <ul><li>Ward Rounds Information Systems </li></ul><ul><li>Clinical Data Analytics Language </li></ul><ul><li>Structured Reporting - Pathology+ Imaging </li></ul><ul><li>Handovers Information System </li></ul>
    5. 5. A corpus of ICU Notes <ul><li>ICU corpus of 44 million words </li></ul><ul><li>Derived all SNOMED codes </li></ul><ul><li>Inferred SCT subset - 2700 concepts cover 96% of usage - 20,000 for 100% </li></ul><ul><li>Use for spell checking in an automatic processor </li></ul><ul><li>Aim to improve quality of medical documentation to enhance automatic processing for other purposes e.g. DSS </li></ul><ul><li>Future - prospective studies to understand the effect on support for patient care and safety </li></ul>
    6. 9. An approach for inferring the SNOMED CT subset <ul><ul><ul><li>1. Identify all the SCT candidates in the clinical notes - this is the extracted set . </li></ul></ul></ul><ul><ul><ul><li>2. Construct a histogram of all the concept codes in the extracted set and separate them into the 17 SCT upper level categories. </li></ul></ul></ul><ul><ul><ul><li>3. Import the code frequency tables into one table to be a fair sample of concepts. </li></ul></ul></ul><ul><ul><ul><li>4. From that extracted set, select the codes that were used at least 100 times (an arbitrary cut off point) - this is the reduced set . </li></ul></ul></ul><ul><ul><ul><li>5. Using appropriate software, compute the minimum spanning closure across the reduced set - this is the closure set . </li></ul></ul></ul><ul><ul><ul><li>6. Manually clean the closure set semi-automatically to remove anomalous concepts and to correct defective SNOMED modeling - this is the clean closure set . </li></ul></ul></ul>
    7. 10. Sample of Notes and SCT Mapping
    8. 11. Sources of Errors <ul><ul><ul><li>An incomplete lexical retinue. An example of this problem is the widespread use of the word “bibasal” in Australian hospitals whilst it is unrecorded in SCT. </li></ul></ul></ul><ul><ul><ul><li>False Positives which include incorrectly deduced SCT concepts due to weaknesses in the natural language processing methods. </li></ul></ul></ul><ul><ul><ul><li>False negatives which are missing when forming the minimal sub-tree and the transitive closure. This may not be a very large problem as many such terms may be identified through the process of building the transitive closure. </li></ul></ul></ul><ul><ul><ul><li>Issues with the internationalisation of spelling, for example anaemia versus anemia, orthographic errors, neologisms, grammatical errors, and typing errors due to English as a second language. </li></ul></ul></ul>
    9. 12. The Full Project Requirements <ul><ul><ul><li>a more reliable extraction from the source corpus with improved orthographic corrections for more accurate lexical verification, </li></ul></ul></ul><ul><ul><ul><li>identification of the minimum spanning closure from the complete set of SCT categories, </li></ul></ul></ul><ul><ul><ul><li>removing from the transitive closure inappropriate and poorly modeled concepts, </li></ul></ul></ul><ul><ul><ul><li>implementation of the subset in an ICU clinical information system for testing its efficiency, </li></ul></ul></ul><ul><ul><ul><li>testing the subset against ICU notes from another institution to assess its generalisability . </li></ul></ul></ul>
    10. 13. Advantages of Reduced Subset of SCT <ul><ul><ul><li>increase the speed of computation and hence make real-time searching across the SCT logical model more practicable, </li></ul></ul></ul><ul><ul><ul><li>decrease the amount of false hits and so make the user interfaces that exploits SNOMED more accurate and efficient saving staff time, </li></ul></ul></ul><ul><ul><ul><li>permit the use of description logic software to process concept generalisations, </li></ul></ul></ul><ul><ul><ul><li>lead the way in the development of a methodology for deriving SNOMED subsets for other clinical specialities using their clinical progress notes. </li></ul></ul></ul>
    11. 14. Use of SCT Encoding of Clinical Notes <ul><li>Indexing notes for SNOMED codes enables: </li></ul><ul><ul><li>Research & Operational Information Retrieval </li></ul></ul><ul><ul><li>Data Analytics </li></ul></ul><ul><ul><li>Audit of Care </li></ul></ul><ul><ul><li>Clinician training for stable terminology </li></ul></ul><ul><ul><li>Extension to Customisable Handovers ISs </li></ul></ul>
    12. 15. A Clinical Data Analytics Language - CliniDAL Principles <ul><ul><li>It can express all questions that are answerable from the database including from narrative content </li></ul></ul><ul><ul><li>It can compute all questions that can be expressed </li></ul></ul><ul><ul><li>It is transportable across all Clinical ISs </li></ul></ul>
    13. 16. Clinical Data Analytics Language (CliniDAL) - Practicals <ul><li>Need for general purpose Information Extraction </li></ul><ul><ul><li>Over aggregated data </li></ul></ul><ul><ul><li>Constrained by many variables </li></ul></ul><ul><ul><li>Over the text notes in the patient record </li></ul></ul><ul><ul><li>From a wide range of Information Systems </li></ul></ul><ul><ul><li>Using a wide range of health dialects </li></ul></ul>
    14. 17. CDAL Request - Basic Structure <ul><li>Nominates </li></ul><ul><ul><li>Terminolgy/Ontology/Classications </li></ul></ul><ul><ul><li>Physical Databases </li></ul></ul><ul><ul><li>Statistical Variable or Expression of key interest </li></ul></ul><ul><ul><li>Patient Grouping </li></ul></ul><ul><ul><li>Medical Expressions </li></ul></ul><ul><ul><li>Time constraints </li></ul></ul><ul><ul><li>Location Constraints </li></ul></ul><ul><li>{Using <SNOMED> } in { <ICU-db> } </li></ul><ul><li>Find <AVG ( Stay )> of <men under 40> + {with <3 rd degree burns to the left hand treated with amoxycillin> } </li></ul><ul><li>{ <during the last 2 years> } from { <postcodes 2300-2999> } </li></ul>
    15. 18. Screenshot of a CDAL query: ARDS SNIFFER : Find all patients’ medical record number (and the number of records retrieved) for patients with age > 16, [AND] arterial blood gas analysis (PaO2 / FiO2) < 300 AND Tidal Volume Peak Pressures (Paw) > 35 OR Delivered tidal volume (Vt) > 8mL IN the GICU (over the last year). Note that: PaO2 / FiO2 = PF Ratio; Paw = PIP; Delivered Vt = Vt Expired
    16. 19. Accessible attributes in ICU-CDAL - CareVue <ul><li>Chart_events (total): 786 </li></ul><ul><ul><li>Chart_events (numeric): 734 </li></ul></ul><ul><ul><li>Chart_events (categorical):52 </li></ul></ul><ul><li>Medication_events: 52 </li></ul><ul><li>Patient_events: 6 </li></ul><ul><li>Lab_events: 63 </li></ul><ul><li>Group_events (total): 74 </li></ul><ul><ul><li>Sedation: 8 </li></ul></ul><ul><ul><li>Inotropes: 14 </li></ul></ul><ul><ul><li>Antibiotics: 46 </li></ul></ul><ul><ul><li>Thromboebolic_prophylaxis: </li></ul></ul>
    17. 20. Proposed Developments for SCT Subsets <ul><li>Make CDAL more portable and reusable </li></ul><ul><li>Expand the language processing in Ward Rounds and Handover systems </li></ul><ul><li>ICRAIS - Intensive Care Real-time Audit IS </li></ul><ul><li>Compact Nursing ISs using NIC & NOC </li></ul><ul><li>Information Exchange fetching and delivering information from all hospital ISs </li></ul>
    18. 21. Generative Clinical Information Management Systems (GCIMS) <ul><li>Allow each department to specify its own Clinical IS in a forms description language </li></ul><ul><li>Link each data item in the CIS to a unique concepts code e.g. SNOMED CT </li></ul><ul><li>Supply a universal & comprehensive retrieval language </li></ul><ul><li>Supply a workflow engine </li></ul><ul><li>SCT subsets will be needed for each Clinical IS </li></ul>
    19. 22. Features of Enhancement Technologies <ul><li>Compact Customisable ISs </li></ul><ul><li>None is mission critical, but all give </li></ul><ul><ul><li>High productivity, </li></ul></ul><ul><ul><li>Enhanced patient safety and outcomes, and </li></ul></ul><ul><ul><li>Unheralded access to data especially text </li></ul></ul><ul><ul><li>Bolt on technologies </li></ul></ul><ul><ul><li>Tailored and managed to suit a local clinical needs </li></ul></ul><ul><ul><li>Removable at any time to allow return to original processes </li></ul></ul><ul><ul><li>Can fetch and deliver from other systems </li></ul></ul>
    20. 23. THE END Health Information Technology Research Laboratory (HITRL)