Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Deriving an ICU Subset of SNOMED CT from Clinical Notes


Published on

Professor Jon Patrick
Health Information Technology Research Laboratory (HITRL -
School of Information Technologies
University of Sydney
(P38, 16/10/08, Coding stream, 3.30pm)

  • Be the first to comment

Deriving an ICU Subset of SNOMED CT from Clinical Notes

  1. 1. Deriving an ICU Subset of SNOMED CT from Clinical Notes Professor Jon Patrick Health Information Technology Research Laboratory (HITRL - School of Information Technologies University of Sydney
  2. 2. Our Strategic Objectives <ul><li>Deliver NLP Enhancement Technologies for intelligent support and processing. </li></ul><ul><li>Build generic, compact, customisable Clinical ISs to enhance existing clinical processes. </li></ul><ul><li>Position Natural Language Processing as the base technology for processing the EMR </li></ul><ul><li>“ all clinical measurement eventually becomes language” </li></ul>
  3. 3. Our HIT Activities <ul><li>60+ Projects in 3 Years </li></ul><ul><li>Language Analysis of clinical texts </li></ul><ul><li>Data Analysis of the EHR </li></ul><ul><li>Converting clinical narratives to SNOMED CT </li></ul><ul><li>Mapping other coding systems to SNOMED CT </li></ul><ul><li>Generating ICU & ED subsets of SNOMED CT </li></ul><ul><li>Building Clinical Information Systems - WeBCIS </li></ul><ul><li>Generating Clinical Information Systems - GCIMS </li></ul><ul><li>Rescuing data from abandoned and decaying ISs - OMNI-LAB, HOSLAB, CARDS, BS, HOSREP. </li></ul><ul><li>Partners: RPAH, NCCH, SWAHS, Children’s Westmead, Path Labs, and more </li></ul>
  4. 4. Enhancement Technologies Active Projects <ul><li>Ward Rounds Information Systems </li></ul><ul><li>Clinical Data Analytics Language </li></ul><ul><li>Structured Reporting - Pathology+ Imaging </li></ul><ul><li>Handovers Information System </li></ul>
  5. 5. A corpus of ICU Notes <ul><li>ICU corpus of 44 million words </li></ul><ul><li>Derived all SNOMED codes </li></ul><ul><li>Inferred SCT subset - 2700 concepts cover 96% of usage - 20,000 for 100% </li></ul><ul><li>Use for spell checking in an automatic processor </li></ul><ul><li>Aim to improve quality of medical documentation to enhance automatic processing for other purposes e.g. DSS </li></ul><ul><li>Future - prospective studies to understand the effect on support for patient care and safety </li></ul>
  6. 9. An approach for inferring the SNOMED CT subset <ul><ul><ul><li>1. Identify all the SCT candidates in the clinical notes - this is the extracted set . </li></ul></ul></ul><ul><ul><ul><li>2. Construct a histogram of all the concept codes in the extracted set and separate them into the 17 SCT upper level categories. </li></ul></ul></ul><ul><ul><ul><li>3. Import the code frequency tables into one table to be a fair sample of concepts. </li></ul></ul></ul><ul><ul><ul><li>4. From that extracted set, select the codes that were used at least 100 times (an arbitrary cut off point) - this is the reduced set . </li></ul></ul></ul><ul><ul><ul><li>5. Using appropriate software, compute the minimum spanning closure across the reduced set - this is the closure set . </li></ul></ul></ul><ul><ul><ul><li>6. Manually clean the closure set semi-automatically to remove anomalous concepts and to correct defective SNOMED modeling - this is the clean closure set . </li></ul></ul></ul>
  7. 10. Sample of Notes and SCT Mapping
  8. 11. Sources of Errors <ul><ul><ul><li>An incomplete lexical retinue. An example of this problem is the widespread use of the word “bibasal” in Australian hospitals whilst it is unrecorded in SCT. </li></ul></ul></ul><ul><ul><ul><li>False Positives which include incorrectly deduced SCT concepts due to weaknesses in the natural language processing methods. </li></ul></ul></ul><ul><ul><ul><li>False negatives which are missing when forming the minimal sub-tree and the transitive closure. This may not be a very large problem as many such terms may be identified through the process of building the transitive closure. </li></ul></ul></ul><ul><ul><ul><li>Issues with the internationalisation of spelling, for example anaemia versus anemia, orthographic errors, neologisms, grammatical errors, and typing errors due to English as a second language. </li></ul></ul></ul>
  9. 12. The Full Project Requirements <ul><ul><ul><li>a more reliable extraction from the source corpus with improved orthographic corrections for more accurate lexical verification, </li></ul></ul></ul><ul><ul><ul><li>identification of the minimum spanning closure from the complete set of SCT categories, </li></ul></ul></ul><ul><ul><ul><li>removing from the transitive closure inappropriate and poorly modeled concepts, </li></ul></ul></ul><ul><ul><ul><li>implementation of the subset in an ICU clinical information system for testing its efficiency, </li></ul></ul></ul><ul><ul><ul><li>testing the subset against ICU notes from another institution to assess its generalisability . </li></ul></ul></ul>
  10. 13. Advantages of Reduced Subset of SCT <ul><ul><ul><li>increase the speed of computation and hence make real-time searching across the SCT logical model more practicable, </li></ul></ul></ul><ul><ul><ul><li>decrease the amount of false hits and so make the user interfaces that exploits SNOMED more accurate and efficient saving staff time, </li></ul></ul></ul><ul><ul><ul><li>permit the use of description logic software to process concept generalisations, </li></ul></ul></ul><ul><ul><ul><li>lead the way in the development of a methodology for deriving SNOMED subsets for other clinical specialities using their clinical progress notes. </li></ul></ul></ul>
  11. 14. Use of SCT Encoding of Clinical Notes <ul><li>Indexing notes for SNOMED codes enables: </li></ul><ul><ul><li>Research & Operational Information Retrieval </li></ul></ul><ul><ul><li>Data Analytics </li></ul></ul><ul><ul><li>Audit of Care </li></ul></ul><ul><ul><li>Clinician training for stable terminology </li></ul></ul><ul><ul><li>Extension to Customisable Handovers ISs </li></ul></ul>
  12. 15. A Clinical Data Analytics Language - CliniDAL Principles <ul><ul><li>It can express all questions that are answerable from the database including from narrative content </li></ul></ul><ul><ul><li>It can compute all questions that can be expressed </li></ul></ul><ul><ul><li>It is transportable across all Clinical ISs </li></ul></ul>
  13. 16. Clinical Data Analytics Language (CliniDAL) - Practicals <ul><li>Need for general purpose Information Extraction </li></ul><ul><ul><li>Over aggregated data </li></ul></ul><ul><ul><li>Constrained by many variables </li></ul></ul><ul><ul><li>Over the text notes in the patient record </li></ul></ul><ul><ul><li>From a wide range of Information Systems </li></ul></ul><ul><ul><li>Using a wide range of health dialects </li></ul></ul>
  14. 17. CDAL Request - Basic Structure <ul><li>Nominates </li></ul><ul><ul><li>Terminolgy/Ontology/Classications </li></ul></ul><ul><ul><li>Physical Databases </li></ul></ul><ul><ul><li>Statistical Variable or Expression of key interest </li></ul></ul><ul><ul><li>Patient Grouping </li></ul></ul><ul><ul><li>Medical Expressions </li></ul></ul><ul><ul><li>Time constraints </li></ul></ul><ul><ul><li>Location Constraints </li></ul></ul><ul><li>{Using <SNOMED> } in { <ICU-db> } </li></ul><ul><li>Find <AVG ( Stay )> of <men under 40> + {with <3 rd degree burns to the left hand treated with amoxycillin> } </li></ul><ul><li>{ <during the last 2 years> } from { <postcodes 2300-2999> } </li></ul>
  15. 18. Screenshot of a CDAL query: ARDS SNIFFER : Find all patients’ medical record number (and the number of records retrieved) for patients with age > 16, [AND] arterial blood gas analysis (PaO2 / FiO2) < 300 AND Tidal Volume Peak Pressures (Paw) > 35 OR Delivered tidal volume (Vt) > 8mL IN the GICU (over the last year). Note that: PaO2 / FiO2 = PF Ratio; Paw = PIP; Delivered Vt = Vt Expired
  16. 19. Accessible attributes in ICU-CDAL - CareVue <ul><li>Chart_events (total): 786 </li></ul><ul><ul><li>Chart_events (numeric): 734 </li></ul></ul><ul><ul><li>Chart_events (categorical):52 </li></ul></ul><ul><li>Medication_events: 52 </li></ul><ul><li>Patient_events: 6 </li></ul><ul><li>Lab_events: 63 </li></ul><ul><li>Group_events (total): 74 </li></ul><ul><ul><li>Sedation: 8 </li></ul></ul><ul><ul><li>Inotropes: 14 </li></ul></ul><ul><ul><li>Antibiotics: 46 </li></ul></ul><ul><ul><li>Thromboebolic_prophylaxis: </li></ul></ul>
  17. 20. Proposed Developments for SCT Subsets <ul><li>Make CDAL more portable and reusable </li></ul><ul><li>Expand the language processing in Ward Rounds and Handover systems </li></ul><ul><li>ICRAIS - Intensive Care Real-time Audit IS </li></ul><ul><li>Compact Nursing ISs using NIC & NOC </li></ul><ul><li>Information Exchange fetching and delivering information from all hospital ISs </li></ul>
  18. 21. Generative Clinical Information Management Systems (GCIMS) <ul><li>Allow each department to specify its own Clinical IS in a forms description language </li></ul><ul><li>Link each data item in the CIS to a unique concepts code e.g. SNOMED CT </li></ul><ul><li>Supply a universal & comprehensive retrieval language </li></ul><ul><li>Supply a workflow engine </li></ul><ul><li>SCT subsets will be needed for each Clinical IS </li></ul>
  19. 22. Features of Enhancement Technologies <ul><li>Compact Customisable ISs </li></ul><ul><li>None is mission critical, but all give </li></ul><ul><ul><li>High productivity, </li></ul></ul><ul><ul><li>Enhanced patient safety and outcomes, and </li></ul></ul><ul><ul><li>Unheralded access to data especially text </li></ul></ul><ul><ul><li>Bolt on technologies </li></ul></ul><ul><ul><li>Tailored and managed to suit a local clinical needs </li></ul></ul><ul><ul><li>Removable at any time to allow return to original processes </li></ul></ul><ul><ul><li>Can fetch and deliver from other systems </li></ul></ul>
  20. 23. THE END Health Information Technology Research Laboratory (HITRL)