Rescuing Data from Decaying and Moribund Clinical Information Systems


Published on

Professor Jon Patrick
Health Information Technology Research Laboratory (HITRL -
School of Information Technologies
University of Sydney
(P39, 17/10/08, Systems & Methods stream, 1.50pm)

1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Rescuing Data from Decaying and Moribund Clinical Information Systems

    1. 1. Rescuing Data from Decaying and Moribund Clinical Information Systems Jon Patrick, Peng Gao, Xin Li, Victor Zhou Health Information Technology Research Laboratory School of Information Technologies
    2. 2. Objectives <ul><li>Identify a systematic methodology for extracting data from decaying and moribund clinical information systems </li></ul><ul><li>Install the data in a contemporary DBMS for data preservation and research analytics </li></ul>
    3. 3. Background
    4. 4. Methods <ul><li>Data Model Reverse Engineering (DMRE ), </li></ul><ul><li>Data Model Minimisation (DMM), </li></ul><ul><li>Data Migration (DM) </li></ul>
    5. 5. Data Model Reverse Engineering (DMRE ) <ul><ul><ul><ul><li>Data Dictionary Extraction. </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Classification of the Entities and Attributes. </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Discovering the Relationships and Dependencies. </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Generating an Entity Relationship Model (ERM) for the legacy system. </li></ul></ul></ul></ul><ul><ul><ul><ul><li>5. Discovering the Business Processes and Rules. </li></ul></ul></ul></ul>
    6. 7. Data Model Minimisation (DMM) <ul><ul><li>Construct the archive Entity Relationship Model: </li></ul></ul><ul><ul><li>Identify the parts of the original model that are no longer needed, </li></ul></ul><ul><ul><ul><li>operational needs (e.g. billing), </li></ul></ul></ul><ul><ul><ul><li>superceded by later technology (e.g. hardware descriptions) </li></ul></ul></ul><ul><ul><li>Create the archival Schema in MySQL: </li></ul></ul><ul><ul><ul><li>use the MYSQL interface to create the physical database matching the newly designed schema. </li></ul></ul></ul>
    7. 8. Figure 2
    8. 9. Data Migration (DM) <ul><li>Export raw data from the legacy system: </li></ul><ul><ul><li>Requires reading data from the legacy database and placing it in temporary files. </li></ul></ul><ul><li>Data Cleaning: </li></ul><ul><ul><li>checking if data values are legal for the data type. This uses programs written to check for valid data types and values. </li></ul></ul><ul><li>Importing the Data into the MySQL archival data warehouse: </li></ul><ul><ul><li>Records are re-assembled to match the new data model. This is not a simple read of the records into a database. Commonly the data has been reorganised so the records of different types have to be brought together to make composite records spanning different tables to their configuration in the legacy system. </li></ul></ul><ul><li>Testing - 20 queries verified against the old and the new databases. </li></ul>
    9. 11. Discussion <ul><li>Initially, largest part of task is DMRE </li></ul><ul><li>Now it is the data migration task </li></ul><ul><li>Data migration requires generic tools </li></ul><ul><li>No development of User Interfaces, just access via SQL </li></ul><ul><li>Ideally a GUI is wanted </li></ul><ul><li>Standard report generator could be used </li></ul><ul><li>In one case we have provided CliniDAL for ad hoc enqueries </li></ul>
    10. 12. Clinical Data Analytics Language - CliniDAL <ul><li>Provides for any ad hoc query </li></ul><ul><li>Knows about all variables in the database </li></ul><ul><li>Indexes all narrative content by SNOMED CT codes </li></ul><ul><li>Can any answer any question answerable by the data in the database </li></ul>
    11. 13. CliniDAL Glossary <ul><li>CliniDAL </li></ul><ul><li>SNOMED CT and TTSCT </li></ul><ul><li>Anatomical pathology system </li></ul><ul><li>SWA P S database </li></ul>
    12. 14. Using TTSCT to identify Snomed concepts within clinical texts
    13. 15. Project objectives <ul><li>Building indices on the SNOMED CT concepts which were cited across 400,000 text results </li></ul><ul><li>Porting CliniDAL program onto the pathology system </li></ul>
    14. 16. CliniDAL indexing process <ul><li>Retrieve the text results in the database and identify each concept in it with the aid of TTSCT. </li></ul><ul><li>Add the SCT concept id of the text result into the index list. </li></ul><ul><li>Indexing table structure: </li></ul>Request_id is being used as an internal unique id in the SWAPS database referring to every single patient medical record. Request_id_list records the list of request ids whose text contains a specific concept and freq records how many times the concept appeared on the corresponding text. Description Text Text Integer Text Text Datetime Datetime Datetime Conceptid Request_id_list Frequency Version_id Index_created_date_time First_seen_date_time Last_seen_date_time Data_type Column_name
    15. 17. Some statistics from the index <ul><li>Total number of processed files: approx. 400,000 </li></ul><ul><li>Total number of unique concepts: over 60,000 </li></ul><ul><li>Total number of concept instances: </li></ul><ul><li>approx. 68,000,000 </li></ul><ul><li>Average occurrences of SCT concepts : </li></ul><ul><li>approx. 1126 </li></ul><ul><li>Average concept instances per text: approx. 170 </li></ul><ul><li>The earliest and latest files: </li></ul><ul><ul><li>1976-02-05 and 2007-11-07 </li></ul></ul>
    16. 18. Most frequent concept top 10 37.84% 151357 One (qualifier value) 38112003 39.12% 156476 On examination - soft tissue swelling-local (finding) 164600000 39.12% 156482 mL/s/100mL of tissue (qualifier value) 259075005 39.12% 156482 Tissue used (attribute) 272742005 39.12% 156487 Structure of peri-ileal tissue (body structure) 81743008 39.12% 156487 Structure of periesophageal tissue (body structure) 113282007 39.12% 156488 Body tissue structure (body structure) 85756007 39.12% 156489 Entire periesophageal tissue (body structure) 322937007 39.12% 156489 Entire peri-ileal tissue (body structure) 264454007 40.50% 162009 Histocompatibility crossmatch (procedure) 72845001 Frequency Occurrences Concept Description Concept ID
    17. 19. CliniDAL system architecture
    18. 20. CliniDAL interface
    19. 21. CliniDAL query result sample
    20. 22. CliniDAL on SWAPS <ul><li>Indexing for SCT has been successful </li></ul><ul><li>CliniDAL is operational </li></ul><ul><li>Planning for a research study </li></ul><ul><ul><li>Q1. Common condition -appendicitis </li></ul></ul><ul><ul><li>Q2. Serious condition -Liver </li></ul></ul><ul><ul><li>Q3. Rare condition </li></ul></ul>
    21. 23. Conclusions <ul><li>Methodology has been tested on 5 systems </li></ul><ul><ul><li>Anatomical pathology (2) </li></ul></ul><ul><ul><li>Breast Screening </li></ul></ul><ul><ul><li>Blood Bank </li></ul></ul><ul><ul><li>Cardiology </li></ul></ul><ul><li>One system has a CliniDAL user interface </li></ul><ul><li>The Data model is properly documented so as to enable SQL access to the data </li></ul><ul><li>The databases are installed on open source DBMS - My SQL </li></ul>
    22. 24. Acknowledgements <ul><li>Peng GAO - Cards </li></ul><ul><li>Xin LI - OMNILAB </li></ul><ul><li>Hui KE - HOSLAB </li></ul><ul><li>Kiran Abraham - HOSREP </li></ul><ul><li>Siti Asma Mohamed - Breast Screening </li></ul><ul><li>Victor ZHOU - CliniDAL on HOSREP </li></ul>