Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Electronic Medical Records for Clinical Research.ppt


Published on

  • Be the first to comment

Electronic Medical Records for Clinical Research.ppt

  1. 1. The electronic medical record (EMR) will constitute the core of a computerized health care system in the near future. The electronic storage of clinical information will create the potential for computer-based tools to help clinicians significantly enhance the quality of medical care and increase the efficiency of medical practice. These tools may include reminder systems that identify patients who are due for preventative care interventions, alerting systems that detect contraindications among prescribed medications, and coding systems that facilitate the selection of correct billing codes for patient encounters. Numerous other "decision-support" tools have been developed and may soon facilitate the practice of clinical medicine. The potential of such tools will not be realized, however, if the EMR is just a set of textual documents stored in a computer, i.e. a "word-processed" patient chart. To support intelligent and useful tools, the EMR must have a systematic internal model of the information it contains and must support the efficient capture of clinical information in a manner consistent with this model. Although commercially available EMR systems that have such features are appearing, the builders and the buyers of EMR systems must continue to focus on the proper design of these systems if the benefits of computerization are to be fully realized. (Sujansky WV. The Benefits and Challenges of an Electronic Medical Record: Much More than a "Word-Processed" Patient Chart. West J Med 1998; 169:176-183) © COPYRIGHT 1998 British Medical Association Electronic Medical Records
  2. 2. <ul><li>Electronic medical records (EMR) </li></ul><ul><ul><li>clinical benefits </li></ul></ul><ul><ul><ul><li>reduction in medical errors, prescription errors </li></ul></ul></ul><ul><ul><ul><li>supports quality improvement programs </li></ul></ul></ul><ul><ul><li>research benefits </li></ul></ul><ul><ul><ul><li>“ Frankly, one of the biggest attractions to LastWord is going to be a boon to clinical research. Information will be accessible in a much more uniform and complete way.” Haile Debas, Daybreak, Feb. 2, 2001 </li></ul></ul></ul><ul><li>UCSF spending $50 million over next 2 years on CareCast </li></ul><ul><li>How real is the promise of EMRs for research ? </li></ul>Background
  3. 3. <ul><li>Understand key properties of useful electronic medical records and data warehousing </li></ul><ul><ul><li>free vs. coded entry </li></ul></ul><ul><ul><li>importance of a standardized clinical vocabulary </li></ul></ul><ul><li>Understand implications of database technologies on clinical research </li></ul><ul><li>Be familiar with basic concepts in data security and privacy </li></ul>Learning Objectives
  4. 4. <ul><li>Example Study </li></ul><ul><ul><li>a single-institution outcomes research question </li></ul></ul><ul><li>Electronic Medical Records (EMRs) </li></ul><ul><ul><li>relational databases </li></ul></ul><ul><ul><li>vocabulary </li></ul></ul><ul><li>Data Warehousing </li></ul><ul><li>Security and Privacy </li></ul>Outline
  5. 5. <ul><li>Retrospective analysis </li></ul><ul><li>Compare 1 year re-admission rate for acute MI for </li></ul><ul><ul><li>diabetics admitted with acute MI, discharged </li></ul></ul><ul><ul><ul><li>on  -blockers </li></ul></ul></ul><ul><ul><ul><li>not on  -blockers </li></ul></ul></ul><ul><li>First acute MI in 1999 to 2001, followup to 2002 </li></ul>An Outcomes Research Project
  6. 6. <ul><li>Find diabetics admitted with AMI ‘ 1999 to ‘ 2001 </li></ul><ul><li>Find whether D/C’ed on  -blocker </li></ul><ul><li>For these patients, find all re-admissions in the year following the index MI </li></ul><ul><ul><li>identify re-admissions that were for acute MI </li></ul></ul><ul><li>Analyze </li></ul><ul><ul><li>predictor =  -blocker status </li></ul></ul><ul><ul><li>primary outcome = acute MI readmission rate </li></ul></ul><ul><ul><li>secondary outcome = length of stay (LOS) </li></ul></ul>Study Steps
  7. 7. <ul><li>Data needed </li></ul><ul><ul><li>admission: Admission Discharge Transfer system </li></ul></ul><ul><ul><li>diabetes diagnosis: chart, HgbA1C </li></ul></ul><ul><ul><li>MI diagnosis: chart, troponins, EKG readings </li></ul></ul><ul><ul><ul><li>or just trust coding of admission diagnosis? </li></ul></ul></ul><ul><ul><li> -blocker usage: orders, pharmacy </li></ul></ul><ul><li>Existing (legacy) systems </li></ul><ul><ul><li>claims, pharmacy, ADT, lab, xray, med record, etc </li></ul></ul>Data Needed for  -Blocker Study Health System Minnesota: 50 paper, 50 computer 200,000 lives, 460 physicians
  8. 8. Data Collection Method
  9. 9. <ul><li>EMR provides individual patient data for </li></ul><ul><ul><li>real-time clinical care </li></ul></ul><ul><ul><li>reimbursement (eg for E&M coding) </li></ul></ul><ul><ul><li>see table for major functionality dimensions </li></ul></ul><ul><li>Clinical workstation includes interfaces to </li></ul><ul><ul><li>practice management systems </li></ul></ul><ul><ul><li>pharmacy benefit management </li></ul></ul><ul><ul><li>knowledge resources (e.g., WWW, guidelines) </li></ul></ul><ul><li>“ EMRs” range from flat file, text-based systems to full-featured workstations </li></ul>What is an EMR?
  10. 10. 8 Types of EMR Functionality
  11. 11. <ul><li>Physician friendliness </li></ul><ul><ul><li>if docs won’t use it, it won’t help research </li></ul></ul><ul><li>What data it contains </li></ul><ul><li>How that data is stored (and retrieved) </li></ul><ul><li>Security </li></ul><ul><li>Cost, maintenance, technical support, etc </li></ul>Critical EMR Features
  12. 12. <ul><li>Workflow compatible </li></ul><ul><ul><li>portable </li></ul></ul><ul><li>Easy data entry </li></ul><ul><ul><li>voice-recognition </li></ul></ul><ul><ul><li>pen-based (PDAs) </li></ul></ul><ul><ul><li>digital ink </li></ul></ul><ul><li>Preserves doctor-patient relationship </li></ul><ul><li>Secure </li></ul>Physician Friendliness Fujitsu 510
  13. 13. <ul><li>Contents: data and detail sufficient for </li></ul><ul><ul><li>real-time clinical care </li></ul></ul><ul><ul><ul><li>notes, orders, labs, prescriptions, xray (reports)... </li></ul></ul></ul><ul><ul><li>administration </li></ul></ul><ul><ul><ul><li>demographic, billing, provider IDs... </li></ul></ul></ul><ul><ul><li>research? </li></ul></ul><ul><ul><ul><li>standardized data collection, symptom scales, etc </li></ul></ul></ul><ul><li>Structure: generally should store contents in relational form </li></ul><ul><ul><li>unstructured free text (flat file) difficult to compute on </li></ul></ul><ul><ul><li>relational data schema provides structure to the EMR data </li></ul></ul><ul><ul><ul><li>e.g., fields for diagnosis, medication name, dosage </li></ul></ul></ul>EMR Contents and Structure
  14. 14. Relational Admissions Database
  15. 15. What Goes Into the Table Cells? <ul><li>If the entire chart were stored in relational tables, all the chart information (including HPI) is in the cells </li></ul><ul><li>Free vs. coded entries </li></ul><ul><ul><li>“ Mrs. Jones suffered an anterior non-Q wave MI” vs </li></ul></ul><ul><ul><li>MI: Yes , Location: Anterior , Type: Non-Q </li></ul></ul><ul><li>Structure and coding is essential for making the EMR more machine interpretable </li></ul><ul><ul><li>free text entries in structured fields better than plain flat file </li></ul></ul><ul><ul><li>even better to code entries into standardized terms </li></ul></ul>
  16. 16. <ul><li>A term is a designation of a concept or an object in a specific vocabulary </li></ul><ul><ul><ul><li>e.g., English blood = German blut </li></ul></ul></ul><ul><li>Standardization required for communication </li></ul><ul><ul><li>acts like a dictionary </li></ul></ul><ul><ul><ul><li>DGIM tried to use STOR to pull out all CHF patients for quality improvement program but terms used were too varied </li></ul></ul></ul><ul><ul><ul><li>i.e., how to guarantee that all acute MI admissions will be retrieved if asked for? </li></ul></ul></ul><ul><li>Vocabularies (collections of terms) </li></ul><ul><ul><li>general standardized: ICD-9, CPT, MeSH </li></ul></ul><ul><ul><li>research-domain specific: CDEs for cancer, etc... </li></ul></ul><ul><ul><li>your own data dictionary </li></ul></ul>Standardization of Clinical Terms
  17. 17. Cost/Benefits of Coding <ul><li>The more coded and more structured your data, the more advanced computing you can do with that data </li></ul><ul><ul><li>because the computer can “understand” more </li></ul></ul><ul><li>But coding and structuring costs time and effort </li></ul><ul><ul><li>selecting billing codes for outpatient practice </li></ul></ul><ul><ul><li>structured templates for clinic notes may be too constraining </li></ul></ul><ul><li>Tradeoff between </li></ul><ul><ul><li>costs of more coding and structuring, and </li></ul></ul><ul><ul><li>benefits to accrue from “smarter” computing </li></ul></ul>
  18. 18. Notable Clinical Vocabularies
  19. 19. Dangers of ICD-9 Coding <ul><li>VBAC uterine rupture rate </li></ul><ul><ul><li>665.0 and 665.1 ICD-9 discharge codes used in study (NEJM 2001;345:3-8) </li></ul></ul><ul><ul><li>letter to editor: in 9 years of Massachusetts data </li></ul></ul><ul><ul><ul><li>716 patients with 665.0 and 665.1 discharged </li></ul></ul></ul><ul><ul><ul><li>reviewed 709 charts </li></ul></ul></ul><ul><ul><ul><li>363 (51.2%) had actual uterine rupture </li></ul></ul></ul><ul><ul><ul><li>others had incidental extensions of C-section incision, or were incorrectly coded or typed </li></ul></ul></ul><ul><ul><ul><li>674.1 (dehiscence of the uterine wound) also used to code another 197 ruptures (or 35% of confirmed cases of uterine rupture) </li></ul></ul></ul><ul><li>Administrative codes are not ideal for research </li></ul>
  20. 20. ICD-9 Concept Coverage <ul><li>How well would ICD-9 do in capturing a medical chart? </li></ul><ul><li>Inpatient and outpatient charts from 4 medical centers abstracted into 3061 concepts [Chute, 96] </li></ul><ul><ul><li>diagnoses, modifiers, findings, treatments and procedures, other </li></ul></ul><ul><li>Matching: 0=no match, 1=partial, 2=complete </li></ul><ul><ul><li>1.60 for diagnoses </li></ul></ul><ul><ul><li>0.77 overall </li></ul></ul><ul><ul><li>ICD-9 augmented with CPT: overall 0.82 </li></ul></ul>
  21. 21. UMLS <ul><li>A meta-thesaurus of over 40 English and non-English vocabularies </li></ul><ul><ul><li>SNOMED, MeSH, ICD-9, CPT, DSM, Read code, etc. </li></ul></ul><ul><ul><li>designates a UMLS preferred term </li></ul></ul><ul><ul><ul><li>e.g., “Atrial Fibrillation” is preferred over </li></ul></ul></ul><ul><ul><ul><ul><li>a fib, afib, or AF </li></ul></ul></ul></ul><ul><ul><ul><ul><li>auricular fibrillation, or ushka predserdiia fibrilliatsiia </li></ul></ul></ul></ul><ul><li>UMLS terms categorized into 55 semantic types </li></ul><ul><ul><li>e.g., signs and symptoms, biologic function, chemicals, finding, pathologic function </li></ul></ul><ul><li>Also links concepts together </li></ul><ul><ul><li>Atrial Fibrillation is-a Cardiovascular Disease </li></ul></ul>
  22. 22. UMLS Semantic Coverage <ul><li>1996 UMLS with ~30 vocabularies (Humphreys) </li></ul><ul><ul><li>32,679 normalized strings submitted (80% for EMR) </li></ul></ul><ul><ul><ul><li>58% exact concept found </li></ul></ul></ul><ul><ul><ul><li>28% related to broader concept but modifications not found </li></ul></ul></ul><ul><ul><ul><li>13% related concept found </li></ul></ul></ul><ul><ul><ul><li>1% not found </li></ul></ul></ul><ul><ul><li>semantic coverage varied from 45% to 71% </li></ul></ul><ul><li>SNOMED International and Read did the best </li></ul><ul><li>Bottom line: current vocabularies cannot fully capture all the clinical concepts in medical charts </li></ul>
  23. 23. Research Data Dictionaries <ul><li>Research data dictionaries are lists of study variables and their definitions </li></ul><ul><li>Standardization of data dictionaries facilitates data sharing, merging, and meta-analysis </li></ul><ul><li>Terms in a data dictionary should ideally come from a standard clinical vocabulary </li></ul><ul><ul><li>e.g., SOB? shortness of breath? breathlessness? </li></ul></ul><ul><ul><ul><li>ICD-9: Dypsnea and other respiratory abnormalities (786.0) </li></ul></ul></ul><ul><ul><ul><li>CPT: no matching concept or term </li></ul></ul></ul><ul><ul><ul><li>UMLS: Dypsnea is preferred term </li></ul></ul></ul>
  24. 24. Notable Research Data Dictionaries <ul><li>Common Data Elements (from the NCI) </li></ul><ul><ul><li>standardized study variables for breast, lung, cervical, prostate cancer </li></ul></ul><ul><ul><li> </li></ul></ul><ul><li>HCFA’s MedQuest modules </li></ul><ul><ul><li>domain specific data dictionaries </li></ul></ul><ul><ul><ul><li>a fib, CHF, diabetes, pneumonia, orthopedics, etc. </li></ul></ul></ul><ul><li>Other domain specific ones? </li></ul><ul><ul><li>prospective meta-analysis movement attempting to disseminate common data dictionaries </li></ul></ul>
  25. 25. Common Data Elements Example <ul><li>Menopausal Status: “Indication of whether a woman is potentially fertile or not.” </li></ul><ul><li>Allowed values: </li></ul><ul><ul><li>Post (Prior bilateral ovariectomy, OR >12 mo since LMP with no prior hysterectomy and not currently receiving therapy with LH-RH analogs [eg. Zolades]) </li></ul></ul><ul><ul><li>Post (Prior bilateral ovariectomy, OR >12 mo since LMP with no prior hysterectomy) </li></ul></ul><ul><ul><li>Pre (<6 mo since LMP AND no prior bilateral ovariectomy, AND not on estrogen replacement) </li></ul></ul><ul><ul><li>Above categories not applicable AND Age < 50 </li></ul></ul><ul><ul><li>Above categories not applicable AND Age >=50 </li></ul></ul>
  26. 26. EMR for Research Summary <ul><li>An EMR is not automatically going to help clinical research </li></ul><ul><ul><li>if it’s all unstructured free text, it won’t help much at all </li></ul></ul><ul><ul><ul><li>the more structured it is (ie more defined fields), the better </li></ul></ul></ul><ul><ul><li>if it’s just coded sporadically in ICD-9 </li></ul></ul><ul><ul><ul><li>problem with gamed codes </li></ul></ul></ul><ul><ul><ul><li>poor coverage of many clinical concepts </li></ul></ul></ul><ul><ul><li>if it’s coded in SNOMED </li></ul></ul><ul><ul><ul><li>some clinical concepts still not well covered </li></ul></ul></ul><ul><ul><ul><li>now nationally site licensed, but </li></ul></ul></ul><ul><li>EMR better than chart review; can we do even better? </li></ul>
  27. 27. <ul><li>Sample Study </li></ul><ul><ul><li>a single-institution outcomes research question </li></ul></ul><ul><li>Electronic Medical Records (EMRs) </li></ul><ul><ul><li>relational databases </li></ul></ul><ul><ul><li>vocabulary </li></ul></ul><ul><li>Data Warehousing </li></ul><ul><li>Security and Privacy </li></ul>Outline
  28. 28. Types of Queries <ul><li>Clinical care </li></ul><ul><li>What was Mr. Smith’s last potassium? </li></ul><ul><li>Does he have an old CXR for comparison? </li></ul><ul><li>What antihypertensives has he been on before? </li></ul><ul><li>What did the neurology consult say about his epilepsy? </li></ul><ul><li>Research </li></ul><ul><li>What % of diabetics with AMI admissions were discharged on  -blockers? </li></ul><ul><li>What was the average Medicine length of stay in 2000 compared to 1995? </li></ul><ul><li>What is the trend in use of head CTs in patients with migraine? </li></ul><ul><li>Is admission creatinine independent predictor of bacteremia outcomes? </li></ul>
  29. 29. What is a Data Warehouse? Data Warehouse Internet ADT Chem EMR XRay PMB Claims <ul><li>Integrated historical data common to entire enterprise </li></ul>MICU Finance Research QA
  30. 30. Types of Data Warehouses <ul><li>A data warehouse is just a collection of data from other databases </li></ul><ul><ul><li>is itself just a database </li></ul></ul><ul><li>Two somewhat distinct types </li></ul><ul><ul><li>clinical data repository </li></ul></ul><ul><ul><ul><li>collects data from day-to-day clinical care, admin data, etc. </li></ul></ul></ul><ul><ul><ul><li>for quality improvement, outcomes research, business decision making… </li></ul></ul></ul><ul><ul><li>research data repository </li></ul></ul><ul><ul><ul><li>collects data from multiple research projects </li></ul></ul></ul><ul><ul><ul><li>may also collect data from day-to-day clinical care, admin data, etc. </li></ul></ul></ul>
  31. 31. Data Warehouses: Hype and Hope <ul><li>Touted for </li></ul><ul><ul><li>business decision making </li></ul></ul><ul><ul><li>health care quality improvement </li></ul></ul><ul><ul><li>outcomes research </li></ul></ul><ul><ul><li>genotype-phenotype correlations for translational research </li></ul></ul><ul><li>Clinical and Genomic Information Management (CGIM) database </li></ul><ul><ul><li>UCSF partnership with IBM </li></ul></ul><ul><ul><li>$4-6 million over 3 years </li></ul></ul><ul><ul><li>goal: a single repository of research data from all UCSF research projects, plus data from STOR, radiology, etc. (maybe CareCast) </li></ul></ul><ul><ul><li>to enable </li></ul></ul><ul><ul><ul><li>analyses and data mining across data sets </li></ul></ul></ul><ul><ul><ul><li>correlation of clinical, genomic, imaging, etc data </li></ul></ul></ul>
  32. 32. <ul><li>Need many types of data for research and QI </li></ul><ul><li>E.g., for our outcomes study, need </li></ul><ul><ul><li>admission: ADT (admission/discharge/transfer) system </li></ul></ul><ul><ul><li>diabetes diagnosis: e-chart, HgbA1C </li></ul></ul><ul><ul><li>MI diagnosis: e-chart, troponins, EKG readings </li></ul></ul><ul><ul><li> -blocker usage: online ordering, pharmacy system </li></ul></ul><ul><li>Existing (legacy) systems </li></ul><ul><ul><li>claims, pharmacy, ADT, lab, xray, med record, etc </li></ul></ul><ul><ul><li>HealthSystems Minnesota with 50 computer systems, 50 paper systems </li></ul></ul>Why are Data Warehouses Useful? Health System Minnesota: 50 paper, 50 computer 200,000 lives, 460 physicians
  33. 33. <ul><li>Extract data from legacy systems </li></ul><ul><li>Clean data and feed it to warehouse </li></ul><ul><li>Allow ad hoc use </li></ul><ul><ul><li>data query, data mining, data analysis </li></ul></ul><ul><li>Service users </li></ul><ul><ul><li>modify data content based on queries </li></ul></ul><ul><ul><li>provide standard reports </li></ul></ul><ul><ul><li>provide alerts to trends </li></ul></ul>Data Warehousing Procedure
  34. 34. <ul><li>Requires physical networking and transmission standards (protocols) </li></ul>Networking Warehouse Internet ADT Chem EMR XRay PMB Claims MICU Finance Research QA
  35. 35. Prerequisites for Large-Scale Medical Data Merging <ul><li>Health-specific network protocols needed </li></ul><ul><ul><li>Health-Level 7 (HL-7) </li></ul></ul><ul><ul><ul><li>to provide standards for the exchange, management and integration of data that support clinical patient care and the management, delivery and evaluation of healthcare services </li></ul></ul></ul><ul><ul><li>Digital Imaging and Communications in Medicine (DICOM) </li></ul></ul><ul><ul><ul><li>common data exchange format for medical images </li></ul></ul></ul>
  36. 36. HL-7 Version 2.x Example <ul><li>MSH|… message header </li></ul><ul><li>PID|… patient identifier </li></ul><ul><li><!-OBR…observation request> </li></ul><ul><li>OBR|1|870930010^OE|CM3562^LAB|80004^ELECTROLYTES|R|198703281530|198703290800||| 401-0^INTERN^JOE^^^^MD^L|N|||||SER|^SMITH^RICHARD^W.^^^DR.|(319)377-4400| </li></ul><ul><li>This is requestor field #1. Requestor field #2|Diag.serv.field #1.|Diag.serv.field #2.|198703311400|||F<CR> </li></ul><ul><li><!-OBX…observation result> </li></ul><ul><li>OBX|1|ST|84295^NA||150|mmol/l|136-148|H||A|F|19850301<CR> </li></ul><ul><li>OBX|2|ST|84132^K+||4.5|mmol/l|3.5-5|N||N|F|19850301<CR> </li></ul><ul><li>OBX|3|ST|82435^CL||102|mmol/l|94-105|N||N|F|19850301<CR> </li></ul><ul><li>OBX|4|ST|82374^CO2||27|mmol/l|24-31|N||N|F|19850301<CR> </li></ul>
  37. 37. <ul><li>Common data schema </li></ul><ul><ul><li>type (e.g. relational) </li></ul></ul><ul><ul><li>data modeling (i.e. column names) </li></ul></ul><ul><li>Common naming of data items </li></ul><ul><ul><li>eg., “MI” vs. “myocardial infarction” </li></ul></ul><ul><li>For online data sharing and merging </li></ul><ul><ul><li>a physical connection between the computers </li></ul></ul><ul><ul><li>common data transmission protocols </li></ul></ul><ul><ul><ul><li>e.g., HL-7 </li></ul></ul></ul><ul><ul><li>common database communication protocol </li></ul></ul><ul><ul><ul><li>e.g. SQL over TCP/IP (the telnet protocol) </li></ul></ul></ul>Prerequisites for Data Warehouse Construction
  38. 38. Data Warehouse Contents ??? Internet ADT Chem EMR XRay PMB Claims MICU Finance Research QA
  39. 39. Should Warehouse Schema = EMR Schema?
  40. 40. Clinical Data Warehouse Schema <ul><li>diagnoses would be ICD-9 codes </li></ul><ul><li>perhaps a separate table for admission diagnoses? </li></ul>
  41. 41. Research Data Warehouse Schema <ul><li>Should depend on anticipated queries </li></ul><ul><li>UCSF in midst of trying to understand this </li></ul><ul><ul><li>are queries mostly within a project? across projects? </li></ul></ul><ul><ul><li>for analysis of ongoing projects? or analysis across completed projects? both? </li></ul></ul><ul><ul><li>anonymized participant data? </li></ul></ul><ul><ul><li>what about participants from other study sites? </li></ul></ul><ul><ul><li>does administrative data (insurance) need to be there too? </li></ul></ul><ul><li>Scientific issues have huge implications for design (and eventual worth) of research warehouse </li></ul><ul><ul><li>if you don’t know what you want, no technology will give it to you </li></ul></ul>
  42. 42. Other CGIM Issues <ul><li>Does CGIM help project databases? data acquisition? </li></ul><ul><li>How would CGIM benefit single project? </li></ul><ul><li>Standard coding vocabulary? standard data representation? </li></ul><ul><li>Standard definition of clinical variables? </li></ul>(in SNOMED-CT) (in MAGE-ML) (in MAGE-ML) CGIM microarray B microarray A • Breast CA ( not DCIS) • Menopause • Osteoporosis (Heel US) • Menopause Project 1 DB 1 Project 2 DB 2 Project 3 DB 3 Project 4 DB 4 • Osteoporosis (DXA) • Menopause • Breast CA ( DCIS ok) • Alzheimers (path) Data mining/Display Tools Radiology STOR
  43. 43. Choosing a Vocabulary <ul><li>For an EMR </li></ul><ul><ul><li>billing: ICD-9, CPT </li></ul></ul><ul><ul><li>clinical data capture: SNOMED-CT best </li></ul></ul><ul><ul><ul><li>under US national site license (ie free for all) </li></ul></ul></ul><ul><ul><ul><li>hard to get docs to choose correct code out of 325,000 terms </li></ul></ul></ul><ul><ul><li>research: any is better than none! </li></ul></ul><ul><li>For your own research databases </li></ul><ul><ul><li>if standard domain-specific data dictionary exists, use it </li></ul></ul><ul><ul><li>if not, use a standard clinical vocabulary </li></ul></ul><ul><ul><ul><li>often ICD-9 or CPT, or SNOMED, or UMLS preferred terms </li></ul></ul></ul><ul><ul><li>try not to be defining your own terms and your own definitions </li></ul></ul><ul><ul><ul><li>upfront work will make it easier to share data later… </li></ul></ul></ul>
  44. 44. Data Warehouse Summary <ul><li>Enterprise viewpoint more appropriate for research than patient viewpoint of EMR </li></ul><ul><li>Integrates data from multiple sources </li></ul><ul><ul><li>need standardization of codes, definitions, and data formats </li></ul></ul><ul><li>Querying and processing occurs “offline” </li></ul><ul><ul><li>little impact on real-time clinical care </li></ul></ul><ul><li>Schema can evolve to optimize for analytic needs </li></ul><ul><ul><li>can make or modify tables off of legacy systems </li></ul></ul>
  45. 45. <ul><li>Compare 1 year re-admission rate for acute MI in diabetics discharged on  -blockers or not </li></ul><ul><ul><li>data captured in EMR and other databases </li></ul></ul><ul><ul><li>data aggregated in data warehouse </li></ul></ul><ul><ul><li>you query the data warehouse — NOT YET…. </li></ul></ul>Study Steps Using EMR
  46. 46. <ul><li>Sample Study </li></ul><ul><ul><li>a single-institution outcomes research question </li></ul></ul><ul><li>Electronic Medical Records (EMRs) </li></ul><ul><ul><li>relational databases </li></ul></ul><ul><ul><li>vocabulary </li></ul></ul><ul><li>Data Warehousing </li></ul><ul><li>Security and Privacy </li></ul>Outline
  47. 47. Privacy vs. Security <ul><li>Security (a technical feature) </li></ul><ul><ul><li>confidentiality </li></ul></ul><ul><ul><ul><li>ensuring that only authorized persons can read or copy information </li></ul></ul></ul><ul><ul><ul><ul><li>encryption of data during transmission impedes eavesdropping only </li></ul></ul></ul></ul><ul><ul><li>integrity </li></ul></ul><ul><ul><ul><li>ensuring that information is modified only in appropriate ways </li></ul></ul></ul><ul><ul><li>availability </li></ul></ul><ul><ul><ul><li>ensuring that information is not made inaccessible </li></ul></ul></ul><ul><li>Privacy (a legal concept) </li></ul><ul><ul><li>right to keep personal information from outside world </li></ul></ul><ul><ul><ul><li>study nurse, data entry clerk, investigator, database administrator, etc may be authorized to see data but may disclose it inappropriately </li></ul></ul></ul>
  48. 48. <ul><li>Physical security </li></ul><ul><ul><li>firewalls </li></ul></ul><ul><li>Encryption </li></ul><ul><ul><li>public/private keys </li></ul></ul><ul><li>People security </li></ul><ul><ul><li>authority </li></ul></ul><ul><ul><li>authentication </li></ul></ul><ul><ul><li>access </li></ul></ul><ul><ul><li>audit </li></ul></ul>Network Security Internet Firewall itsa jaundice LAN
  49. 49. <ul><li>Authentication </li></ul><ul><ul><li>are you who you say you are? </li></ul></ul><ul><ul><ul><li>use passwords, biometrics (e.g., retinal scan), smartcards </li></ul></ul></ul><ul><li>Authority </li></ul><ul><ul><li>do you have a need to know? </li></ul></ul><ul><ul><ul><li>different levels of data access for different users </li></ul></ul></ul><ul><li>Access </li></ul><ul><ul><li>how to allow only authenticated users to perform authorized activities on authorized data? </li></ul></ul><ul><li>Audit </li></ul><ul><ul><li>record of who actually got into what </li></ul></ul>People Security
  50. 50. De-identification Isn’t Easy <ul><li>87% of the American populace can be uniquely identified by only [Sweeney, L. ‘97] </li></ul><ul><ul><li>date of birth </li></ul></ul><ul><ul><ul><li>in room of 23 people, what is chance that 2 people will share the same birthday (independent of year of birth)? </li></ul></ul></ul><ul><ul><ul><li> </li></ul></ul></ul><ul><ul><li>gender </li></ul></ul><ul><ul><li>five-digit ZIP code </li></ul></ul><ul><ul><li>easy to find someone’s info if you’re looking for it; harder to find out who’s info it is that you have </li></ul></ul><ul><li>Anonymizing databases does not remove your duty to enforce security and safeguard privacy </li></ul>
  51. 51. Summary of Privacy & Security <ul><li>Computing/network infrastructure can deal with security </li></ul><ul><ul><li>but privacy is a policy matter </li></ul></ul><ul><li>Anonymizing of databases helps but it isn’t foolproof </li></ul><ul><li>In general, people are the weakest security and privacy link </li></ul>
  52. 52. <ul><li>Compare 1 year re-admission rate for acute MI in diabetics discharged on  -blockers or not </li></ul><ul><ul><li>data captured in EMR and other databases </li></ul></ul><ul><ul><li>data aggregated in data warehouse </li></ul></ul><ul><ul><li>you request IRB approval </li></ul></ul><ul><ul><li>you are authorized to to conduct HIPAA-compliant search (e.g.,. Limited Data Set) in data warehouse </li></ul></ul><ul><ul><li>audit trail of queries are maintained </li></ul></ul>Outcomes Research Project
  53. 53. <ul><li>EMR does not always = easier clinical research </li></ul><ul><li>Structure and coding is critical </li></ul><ul><ul><li>structure: schema needed, designed to support intended queries </li></ul></ul><ul><ul><li>coding: standardized, coded data trumps free text </li></ul></ul><ul><ul><ul><li>especially important for research </li></ul></ul></ul><ul><ul><ul><li>but most standardized vocabularies have insufficient clinical coverage </li></ul></ul></ul><ul><ul><li>data formats: standard needed for genomic, imaging, etc. data </li></ul></ul><ul><li>Clinical/Research data warehouses could be useful for research but must be designed correctly with high-quality, cross-compatible data </li></ul>Take-Home Points