Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Understanding Clinical Forms: Structure Discovery and SNOMED CT Annotation


Published on

Published in: Business, Technology
  • Be the first to comment

  • Be the first to like this

Understanding Clinical Forms: Structure Discovery and SNOMED CT Annotation

  1. 1. Understanding Clinical Forms: Structure Discovery and SNOMED CT Annotation NCBI, NLM, NIH May 11 2012 Ritu Khare Drexel University College of Medicine Philadelphia, PA1
  2. 2. Presentation Order 1. Motivation  A flexible EHR 2. Form Understanding  Structure Discovery  Form Annotation 3. Contributions and Plans2
  3. 3. Clinicians & Electronic Health Records(EHRs) Electronic Health Records IT professionals Clinician and vendors Data Collection Needs Inconsistent (Gurses et al. ,2009) Integration of New Needs Inflexible(Gurses et al. ,2009, An et al. 2009) OverallWorkflow Unintended consequences (Ash et al. 2004, Lee 2007, Harrison et al. 2007)3
  4. 4. The flexible Electronic Health Record (fEHR) Form-based approach (using “forms” as design artifacts) 1. clinicians high familiarity I want to collect patient’s information, personal and quotient on forms vital signs, etc 2. rich information embedded in forms to guide DB design The fEHR System Form Form Design (or Import) Understanding EHR and Mapping Interface Database Clinician4
  5. 5. The flexible EHR: Key Challenges The fEHR System 1 2 3 Form Form Form EHR Design Understanding Mapping Interface Database Clinician Usability Schema Information Extraction & Data Integration Structure Discovery Form Annotation5
  6. 6. Presentation Order 1. Motivation  A flexible EHR 2. Form Understanding  Form Structure Discovery  Hidden Markov Models  Form Annotation 3. Contributions and Plans6
  7. 7. Structure Discovery A Clinical Form The Corresponding Form Tree :text label :format :value  The form tree accurately captures the contextual associations among the form elements. (Dragut et al. 2009, Wu et al. 2009)7
  8. 8. Challenges of Automatic Structure Discovery  Designed for human understanding  Visual arrangements  Past experiences  For a machine,  form is an unstructured document  Source code contains only presentation/formatting structure  Existing Approaches (Zhang et al., 2004 and He et al., 2004)  Short search forms  Rules and heuristics8
  9. 9. Analysis of the Form Design Process Demographics segmentCategory label field Medical decision segment field format format Misc. text Misc. text Subcategory Assessment segment Orders segment label subfield subformat  Elements and their sequence:  Segment boundaries and roles:  Visible  Hidden and arbitrarily laid out Form design process can be modeled using Hidden Markov Models.9
  10. 10. Using Hidden Markov Model(HMM)  HMM:  2-layered HMM  A finite state automaton with stochastic  T-HMM: assigns tags to state transitions and symbol emissions elements, e.g., category, field, (Rabiner, 1989) format, etc.  Used to model and decode the real  S-HMM: creates groups of world processes which are implicit contextually related elements. and unobservable category HMM-based artificial designer field T-HMM format category S-HMM10
  11. 11. Inner Functionality of the 2-layered HMM Algorithms Supervised Training: Expectation Maximization Testing: Viterbi text text text text text text text text text checkParser area box areaT-HMM category field format Misc- sub- field Misc- format field format text category text Begin- End-S-HMM Inside segment Begin segment Inside sub segment segment/ sub End sub segment segment11
  12. 12. Tree Generation Overall Approach12
  13. 13. Datasets (52 forms from 6 medical institutions) Dataset Avg. #Text Avg. #Inputs HMM Training Data1 Walk in clinic encounter 32.33 49.33 T-HMM and S-HMM state forms (3 forms) sequences for each form2 Nursing patient admission 17.17 33 T-HMM: category, field, forms (6 forms) format, category, field,3 OB/GYN forms (7 forms) 16.14 37.29 format, … S-HMM: begin, inside, eng,4 Adult visit encounter forms 47.83 65.22 begin, inside, end,… (18 forms)5 Family practice forms 82.61 100.46 Gold Benchmark (13 forms) For result evaluation: 52 Gold6 Child visit encounter forms 53 67.4 Std Trees (5 forms) Home-grown interface Home-grown DIY interface that captures designer’s on-the-fly13 intentions
  14. 14. Results: Tree Extraction (Structure Discovery) Accuracy HMM Testing Cross-validation leave 1 out method Dataset1 Dataset2 Dataset3 Dataset4 Dataset5 Dataset6 Total Tree 272 362 461 2606 2674 644 Edges Accuracy 95.22% 97.51% 100% 97.58% 98.46% 96.11%  An average tree with 135 edges gets generated in 0.08 seconds. Conclusions HMMs are very effective for structure discovery Subsume existing approaches14
  15. 15. Presentation Order 1. Motivation  A flexible EHR 2. Form Understanding  Form Structure Discovery  Hidden Markov Models  Form Annotation  Bayesian Classifier 3. Contributions and Plans15
  16. 16. Form Annotation  Semantic Heterogeneity across clinical data sources (Halevy, 2005, Henry et al. 1993, Hernandez et al. 2005, Wright et al., 1999) ? MRN Med Rec # Medical Record Number Blood Pressure Diastolic/Systolic BP Constitutional Vital Signs Physical Status Controlled Medical Vocabularies should be involved in the design artifacts of the healthcare systems. (Jean et al., 2007, Sugumaran and Storey, 2002) fEHR16 Form Template (Design Artifact) EHR Database
  17. 17. Form Annotation Clinical Encounter Form SNOMED CT  The Systematized Nomenclature of Medicine - Clinical Terms (Intl. Health Terminology Stds. Dev. Org)  Most comprehensive clinical vocabulary (SNOMED CT User Guide, 2009).  >360,000 logically-defined clinical concepts (Hina et al., 2010, Stenzhorn et al., 2009). Form SNOMED CT Concept Term 11615400: Patient (person) Patient 398225001: Medical record MRN number (observable entity)17
  18. 18. SNOMED CT Concepts SNOMED CT concept id: 0231832 Semantic Categories Fully-specified-name: Respiratory Rate (Observable Entity) •Attribute Preferred Term: Respiratory Rate •Body Structure Synonym: Respiration Frequency •Disorder •Finding •Observable Entity concept id: 362508001 •Occupation Fully-specified-name: Both eyes, entire (Body Structure) •Person Preferred Term: Both eyes, entire •Physical Object Synonym: OU- Both eyes •Procedure •Racial Group •Situation •…18
  19. 19. SNOMED CT Browsers: (Rogers and Bodenreider, 2008) Existing Annotation Services General Search Category Specific Search19
  20. 20. Form Annotation Challenges  Diversity Challenge  Context Challenge  Different clinicians - different terms  Same Form Term - Different  MRN, Med. Rec.# Concepts.  Vital signs, Constitutional, Physical status20
  21. 21. Solution Premises  The first, i.e., the most string-  The key is to identify the SNOMED similar, result retrieved by the CT semantic category appropriate category-specific search is usually for a given term. the desired concept. How to automatically determine the SNOMED CT Semantic Category ? appropriate for a given form term ?21
  22. 22. The implicit relationship between the term context (i.e., the form tree) and the desired semantic category Naïve Bayes Classifier can be formally captured into  Based on the Bayes theorem (Han a STATISTICAL MODEL. and Kamber 2006). Procedure  Class Labels (SNOMED CT Person root semantic categories )  attribute, body structure, disorder, Observable Entity Patient Examination …  Classification Features (local Name Gender Respiratory structure) Observable  Node type Entity  Parent node type Observable  Child node Type Entity M F nl  Parent Semantic Category perc.  Grandparent Semantic Category Finding QualifierValue QualifierValue22
  23. 23. Form Annotation Algorithm and Implementation Manual Annotations Form Tree Training Data Category Semantic SNOMED CTForm Structure Features Classification Membership Category Category SNOMED CT CategoryTerm Analyzer Model Specific Probabilities Picker Concept Search (API) Procedure Person root Observable Entity Patient Examination Name Gender Respiratory Observable Observable Entity Entity 23 Hybrid = Contextual Structure + Linguistics
  24. 24. Data Manual (Gold) Annotations Total 4235 form terms were Dataset Avg. SNOMED CT manually studied and 2506 (59%) # Mappability had corresponding SNOMED CT Terms concept 1 Walk in clinic encounter forms 32.33 75.77 % Term Concept ID (3 forms) Patient 11615400: Patient (person) 2 Nursing patient admission 17.17 63.98% forms (6 forms) MRN 398225001: Medical record number (observable entity) 3 Labor & delivery DB data-entry 16.14 58.8 % forms (7 forms) … ………………. 4 Adult visit encounter forms 47.83 56.2% (18 forms) Some Unmapped Terms 5 Family practice forms 82.61 59.38% (13 forms) no scleral icterus 6 Child visit encounter forms 53 62.21% chronic back pain (5 forms) Follow up with PCP Sent to ER24
  25. 25. Goal: To study whether…Experiment Design structure can improve annotation performance. SNOMED CT Form General SNOMED CT Measures Term Search Concept Precision # correct annotations/# annotations Baseline (linguistics only) Recall # correct annotations/# gold annotations Category Semantic SNOMED CTForm Structure Features Classification Membership Category SNOMED CT Category CategoryTerm Analyzer Model Probabilities Picker Specific Search Concept Hybrid (linguistics + structure) Semantic Category SNOMEDForm Structure Features Classification Membership Category Category SNOMED CT Picker CT CategoryTerm Analyzer Model Probabilities +candidate Specific Search Concept set expansion25 Hybrid++
  26. 26. Annotation duration Results /form = 1- 11 s  Term processing component  Baseline: p=0.60, r= 0.46  remove special characters (-, #, /,)  Baseline to Hybrid  acronym expansion  Precision improved 26%  BTL (Bilateral Tubal Litigation)  Hybrid to Hybrid++  VTE (Venous Thromboembolism)  Precision improved 13%  Recall improved 17%  Hybrid++: p=0.86, r= 0.60 (F-score = 0.71) Implications Contextual structure improves the overall annotation performance  Precision only slightly improved (3-5%) Linguistics only influence the recall  Recall improved majorly (25%)26  Final p= 0.89, r = 0.76 (F-score =0.82)
  27. 27. Presentation Order 1. Motivation  A flexible EHR 2. Form Understanding  Form Structure Discovery  Hidden Markov Models  Form Annotation  Bayesian Classifier 3. Contributions and Plans27
  28. 28. Summary: Clinical Form Understanding cx2 2. SNOMED CT Annotation cx1  Naïve Bayes Classifier cy2  0.89 (precision) and 0.76 (recall) cy1  Structure helps improve annotation  43% precision, 29% recall 1. Structure Discovery • Limitations  Hidden Markov Models • Supervised learning • Leverage limited semantics from  High accuracy( 97.85%) SNOMED CT  Limitations  Supervised learning  Weak entities, and other constraints  Advanced form features Related Publications: CIKM 2009, SIGMOD Record 2010, IHI 2010, ER 2011, IHI 201228
  29. 29. Application: the flexible EHR The fEHR System 1 2 3 Design or Form Mapping Import Understanding Algorithms FormClinician EHR Database • Discover Semantic Correspondences • Evolve Existing Database Experiments 52 forms (from 6 clinics) generate 6 databases (35-450) Annotation helps improve the integration process29 (database quality by 13%, merging scenario identification by 19%)
  30. 30. Other Applications Structure Discovery SNOMED CT Annotation  Web Search Form  Clinical form-driven Understanding database design process.  Database elements are  Deep Web Visibility named after form terms  Meta-search Engines  Used on any domain  To prepare databases for  Movies, health, automobile, … future integration.  Biological Forms30
  31. 31. Current and Future Projects Improving Form Annotation Unstructured EHR/Web data  Involve expert annotator to  Extract structure from narrative prepare gold standards data  Specialty specific forms  visit notes, discharge summaries  OB/GYN  Error control algorithms  Use other UMLS terminologies  Post coordinated mapping31 A Typical Patient Visit Note (created by physician)
  32. 32. Acknowledgements Computer and Information Physicians and Clinical Scientists Researchers  Dr Yuan An  Dr Prudence Dalrymple  Dr Tony Hu  Dr Kalatu Davies  Dr Jason Li  Dr Michele Follen  Dr Min Song  Dr Sandra Hartmann  Dr Il-Yeol Song  Dr Paul Nyirjesy  Dr Christopher Yang  Dr Sandra Wolf32
  33. 33. Thank you33