Your SlideShare is downloading. ×
Chicago Health Atlas:  The Promise, Process, and Problems of  using electronic health record data for            populatio...
Session Preview• What is the Chicago Health Atlas?• The Promise:   Contextual factors that play a role in the   collaborat...
Chicago Health Atlas is a . . .       collaboration• Informatics researchers from multiple  healthcare institutions• Chica...
Chicago Health Atlas is a . . .         website
Chicago Health Atlas is a . . .         database• De-identified electronic health record  data for ~1 million Chicagoans• ...
Chicago Context:Person, Place, Time
Chicago: Person, Place, Time                     Percent change, Percent of total      Group                        2000-2...
Chicago: Person, Place, Time                                           229 Square miles                                   ...
Chicago Context:
Healthy Chicago sets goals for. . . • Public policy and legislation (n=56) • Health education and awareness (n=45) • Inter...
HEALTHY CHICAGOChicago Department of Public Health                                      Infrastructure
Highlights       Infrastructure• Establish an Office of Epidemiology and  Public Health Informatics• Expand epidemiology c...
NYC Macroscope     Scientific Advisory Group• New York City has embarked on a study to  validate population health estimat...
Highlights         Infrastructure• Increase the  availability of  public health data  through the City  of Chicago  website
Chicago Context:Health Information Exchange
Illinois RegionalHealth Information Exchanges
Even if we don’t have a matureHIE or a Regenstrief Institute,is it possible to . . .• Leverage existing EHR data• Weave to...
Design Considerations• Limit sharing of any protected health  information• Yet account for care of the same patient  at mu...
Process – getting started• Coordinated IRB approval across multiple  institutions.   – Constrained to adults aged 18-89   ...
ProcessHashing and Matching Methods
How we “Hashed” our Data-Hash algorithms accept variable size input messages and produce a smallfixed-size output called a...
Preliminary SHA-1 Single             Institution Validation5-Variable Hash                                           Conca...
Updated Hash Method•   SHA-1 was found to have a potential security issue, moved to a    second generation Hash, SHA-512* ...
Updated Hash Method (cont.)•   Creates 5 hash IDs (with probability weights) depending on availability of    last name, fi...
Diabetes (250.xx)                        Institution A                        Institution C/                              ...
Data Dictionary• Standardized data specifications for data  extractions from participating sites  – Demographics  – Vital ...
ProcessPrivacy and Re-Identification      Considerations    Courtesy of Brad Malin     Vanderbilt University
De-Identified Health Information    De-identified health information neither identifies nor provides    a reasonable basis...
HIPAA Expert Determination        (abridged) Certify via “generally accepted statistical and scientific principles & metho...
Uniqueness Analysis     Model         Uniques (%) Uniques (People)     Safe Harbor   0.000064%   13
Uniqueness Analysis     Model               Uniques (%) Uniques (People)     Safe Harbor         0.000064%   13     Chicag...
Uniqueness Analysis     Model               Uniques (%) Uniques (People)     Safe Harbor         0.000064%   13     Chicag...
Completing the Re-identification             Requires Resources                     Safe Harbored          •   Could link ...
Risk will Vary Across Regions                Voter Registration Databases                IL                      MN       ...
Uniqueness Analysis     Model               Uniques (%) Uniques (People)     Safe Harbor         0.000064%   13     Chicag...
Uniqueness Analysis     Model                 Uniques (%) Uniques (People)     Safe Harbor           0.000064%      13    ...
Uniqueness Analysis     Model                 Uniques (%) Uniques (People)     Safe Harbor           0.000064%      13    ...
Next Steps• Consider re-identification risk options  – Coarsen ZIP codes  – Coarsen Ethnicities  – Coarsen Age groups• Sea...
FindingsA promising source of prevalence estimates
Data contribution summary,               April 2013          Data Type                        Institution                 ...
How many patients receive care    at more than one institution?  No. of institutions Number                               ...
Sample size/cohort comparison,    by residential ZIP code,    BRFSS* vs. Chicago Health AtlasSource      Min Median Mean M...
Diabetes prevalence estimateby residential ZIPPercent=# of patients with > 1 diabetes mellitus diagnosis code  # of patien...
No, patient does not                                                        have type 2 diabetesFinding type 2 diabetesin ...
Diabetes prevalence estimateby residential ZIPPercent=# of patients with > 1 diabetes mellitus diagnosis code             ...
Percent of Atlas patients with  diabetes diagnosis in 2006-2010 Percent                             Minimum number of visi...
Hypertension prevalence estimateby residential ZIPPercent=# of patients with > 1 hypertension diagnosis code  # of patient...
Coronary heart disease prevalenceestimateby residential ZIPPercent=# of patients with > 1 CHD diagnosis code  # of patient...
Gun shot wound prevalenceestimateby residential ZIPPercent=# of patients with > 1 gun shot wound diagnosis code  # of pati...
Problem  Applying estimates to Chicago– rather than patient – populations
Age distribution comparison, 2010Percent            Age groups
Race-ethnicity comparison                        Percent of total      Group                     Atlas        2010 CensusN...
Geographic coverageby residential ZIP Percent= # of patients with visit in 2010   2010 Census population                  ...
ProblemZIP Codes aren’t meaningful     geographic units
Imputation of ZIP code rates to       community area           Diabetes hospitalization, 2010                             ...
Imputation of ZIP code rates to       community area           Diabetes hospitalization, 2010                             ...
Maps courtesy of Chieko Maene, University of Chicago, as part of CDPH-UC Diabetes Translational Research Collaboration.
Dasymetric areal interpolation1. Calculate for each ZIP code    Male & female x 19 age groups = 28 rates                  ...
Dataset description elements•   Description (who, what, where, when)•   Definitions•   Calculations and formulas•   Limita...
Chicago Health Atlas Funders• Otho S.A. Sprague Institute• Northwestern Memorial Hospital  Community Engagement
Health Atlas Team• Northwestern University: John Cashy, Anna Roberts, Sara  Lake• Univ. of Illinois-Chicago: Bill Galanter...
facebook.com/ChicagoPublicHealth             @ChiPublicHealthHealthyChicago@CityofChicago.org             312.747.9884    ...
Chicago Health Atlas: The Promise, Process, and Problems of using electronic health record data for population health
Chicago Health Atlas: The Promise, Process, and Problems of using electronic health record data for population health
Chicago Health Atlas: The Promise, Process, and Problems of using electronic health record data for population health
Chicago Health Atlas: The Promise, Process, and Problems of using electronic health record data for population health
Chicago Health Atlas: The Promise, Process, and Problems of using electronic health record data for population health
Chicago Health Atlas: The Promise, Process, and Problems of using electronic health record data for population health
Chicago Health Atlas: The Promise, Process, and Problems of using electronic health record data for population health
Upcoming SlideShare
Loading in...5
×

Chicago Health Atlas: The Promise, Process, and Problems of using electronic health record data for population health

348

Published on

Published in: Health & Medicine
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
348
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Transcript of "Chicago Health Atlas: The Promise, Process, and Problems of using electronic health record data for population health"

  1. 1. Chicago Health Atlas: The Promise, Process, and Problems of using electronic health record data for population health April 4, 2013Abel Kho, MD Roderick (Eric) Jones, MPHNorthwestern University Chicago Dept. Public Health
  2. 2. Session Preview• What is the Chicago Health Atlas?• The Promise: Contextual factors that play a role in the collaboration• The Process: Getting started, developing matching algorithms, minimizing reidentification risk• The Problems: Deriving meaning and delivering it to people who can use it
  3. 3. Chicago Health Atlas is a . . . collaboration• Informatics researchers from multiple healthcare institutions• Chicago Regional Extension Center (CHITREC)• Chicago Community Trust• Chicago Department of Public Health
  4. 4. Chicago Health Atlas is a . . . website
  5. 5. Chicago Health Atlas is a . . . database• De-identified electronic health record data for ~1 million Chicagoans• In-patient and out-patient visits spanning 2006-2011• Individual patient records matched across institutions
  6. 6. Chicago Context:Person, Place, Time
  7. 7. Chicago: Person, Place, Time Percent change, Percent of total Group 2000-2010 in 2010Chicago 7 [2.7 million]Non-Hispanic black 17 32Non-Hispanic white 6 32Hispanic 3 29Non-Hispanic Asian 14 5
  8. 8. Chicago: Person, Place, Time 229 Square miles 77 neighborhood “Community areas” Lake Michigan with population median of 31,000O’Hare (range, 3,000 – 99,000) Stem Leaf # Boxplot 9 9 1 0 9 4 1 | 8 | 8 02 2 | Loop 7 99 2 | 7 23 2 | 6 | 6 44 2 | 5 556667 6 |Suburban Cook County 5 223 3 | 4 559 3 +-----+ 4 0124 4 | | Midway 3 5666799 7 | + | 3 01112233 8 *-----* 2 55669 5 | | 2 01123334 8 | | 1 568888899 9 +-----+ 1 01233334 8 | 0 6679 4 | 0 33 2 | ----+----+----+----+ Multiply Stem.Leaf by 10**+4 All but two community areas have larger populations than the least- populated Illinois county
  9. 9. Chicago Context:
  10. 10. Healthy Chicago sets goals for. . . • Public policy and legislation (n=56) • Health education and awareness (n=45) • Interventions and programs (n=92)
  11. 11. HEALTHY CHICAGOChicago Department of Public Health Infrastructure
  12. 12. Highlights Infrastructure• Establish an Office of Epidemiology and Public Health Informatics• Expand epidemiology capacity through an increase in staff and the development of strategic partnerships with other entities who use or collect public health data
  13. 13. NYC Macroscope Scientific Advisory Group• New York City has embarked on a study to validate population health estimates from its Primary Care Information Project• CDPH involvement has lead to collaboration on developing vision and methodology for more widespread use of EHR data for public health
  14. 14. Highlights Infrastructure• Increase the availability of public health data through the City of Chicago website
  15. 15. Chicago Context:Health Information Exchange
  16. 16. Illinois RegionalHealth Information Exchanges
  17. 17. Even if we don’t have a matureHIE or a Regenstrief Institute,is it possible to . . .• Leverage existing EHR data• Weave together data from multiple institutions with publicly available data• Measure disease burden and care delivered?
  18. 18. Design Considerations• Limit sharing of any protected health information• Yet account for care of the same patient at multiple institutions• Protect anonymity of patients/providers/institutions• Enable linkage to new information and sources as it becomes available – Patient level – Geographic location
  19. 19. Process – getting started• Coordinated IRB approval across multiple institutions. – Constrained to adults aged 18-89 – Limited to structured data, no free text – Focus on 606xx zip codes, with known overlapping care institutions and high population density• Instead of an EMPI, create a lightweight software application to pass identifiers through a standard set of preprocessing steps, and then “hash” the data
  20. 20. ProcessHashing and Matching Methods
  21. 21. How we “Hashed” our Data-Hash algorithms accept variable size input messages and produce a smallfixed-size output called a hash value or message digest-The hash is non-degenerate; only 1 input message per final hash value-The hash is 1-way; Easy to go from message to hash value, very hard to gofrom hash value to message.-We initially used an early hash, Secure Hash Algorithm-1 (SHA-1).http://csrc.nist.gov/publications/nistbul/b-May-2008.pdf
  22. 22. Preliminary SHA-1 Single Institution Validation5-Variable Hash Concatenate WilliamGalanter22732M123456789 William Galanter 3/31/1962 M SSNWilliamGalanter22732M123456789 SHA1 20802322ED366A1EFD562A6219C4D7AF993BADAD4-Variable Hash William Galanter 3/31/1962 M Concatenate & SHA112345678901234567890123456789012345
  23. 23. Updated Hash Method• SHA-1 was found to have a potential security issue, moved to a second generation Hash, SHA-512* (512 bit)• Significant focus on data pre-processing / normalization• Trimming spaces and non A-Z characters, lower case _Jimmy__ O’Brien Jr. jimmy, obrien• Remove “-” from SSN and remove all invalid combinations• Only allow Birth year >1921• Use “F” and “M” for sex• Replace missing elements with missing data indicators*http://csrc.nist.gov/groups/ST/toolkit/secure_hashing.html
  24. 24. Updated Hash Method (cont.)• Creates 5 hash IDs (with probability weights) depending on availability of last name, first name, date of birth (DOB), gender, SSN. – All data available (1.0) – All fields except; no DOB, or no First and last name, or no SSN (0.3) – All fields, but only first three letters of names available (0.1) – SOUNDEX codes (phonetic equivalents) of the first and last name plus date of birth and gender (0.1)• Wrapped up into a standalone Java program• Can readily consume other data sources (e.g. Social Security Death Index Tables)
  25. 25. Diabetes (250.xx) Institution A Institution C/ Hash ID-1 Honest Broker John john Hash ID-2 O’Dwyer Pre- odwyer Hash Hash ID-3 6/12/1970 06121970 Process 987654329 Fxn Hash ID-4987-65-4329 Hash ID-5 M m Replace Matched StudyID HashIDs 250.xx with 401.xx Unique John john StudyID O dwyer Hash ID-1 Pre- odwyer Hash Hash ID-2 6/12/70 06121970 male Process Fxn Hash ID-3 m Hash ID-4 Hash ID-5 HTN Institution B (401.xx)
  26. 26. Data Dictionary• Standardized data specifications for data extractions from participating sites – Demographics – Vital signs – Diagnoses • Study ID | Month/Year | Encounter type | Encounter number | Diagnosis code – Medications – Laboratory tests • Study ID | Month/Year | Lab test name | Result | Units | Normal Range | Specimen type
  27. 27. ProcessPrivacy and Re-Identification Considerations Courtesy of Brad Malin Vanderbilt University
  28. 28. De-Identified Health Information De-identified health information neither identifies nor provides a reasonable basis to identify an individual. There are two ways to de-identify information; either:(1) a formal determination by a qualified statistician;(2) the removal of specified identifiers of the individual and of the individual’s relatives, household members, and employers is required, and is adequate only if the covered entity has no actual knowledge that the remaining information could be used to identify the individual. 29
  29. 29. HIPAA Expert Determination (abridged) Certify via “generally accepted statistical and scientific principles & methods, that the risk is very small that the information could be used, alone or in combination with other reasonably available information, by the anticipated recipient to identify the subject of the information.” 30
  30. 30. Uniqueness Analysis Model Uniques (%) Uniques (People) Safe Harbor 0.000064% 13
  31. 31. Uniqueness Analysis Model Uniques (%) Uniques (People) Safe Harbor 0.000064% 13 Chicago Health Atlas 0.3% 8,050
  32. 32. Uniqueness Analysis Model Uniques (%) Uniques (People) Safe Harbor 0.000064% 13 Chicago Health Atlas 0.3% 8,050
  33. 33. Completing the Re-identification Requires Resources Safe Harbored • Could link to registries Records – Birth – Marriage – Death – Divorce Identified Identified IdentifiedClinical Records Population Records Resource • What’s in vogue? Voter registration DBs Chicago Health Atlas Model Benitez & Malin. JAMIA. 2010.
  34. 34. Risk will Vary Across Regions Voter Registration Databases IL MN TN WA WIWHO Registered Political MN Voters Anyone Anyone Anyone Committees (ANYONE – In Person)Format Disk Disk Disk Disk DiskCost $500 $46; “use ONLY for $2500 $30 $12,500 elections, political activities, or law enforcement”Name     Address     Date of Birth    Sex   Race  Benitez & Malin. JAMIA. 2010.
  35. 35. Uniqueness Analysis Model Uniques (%) Uniques (People) Safe Harbor 0.000064% 13 Chicago Health Atlas 0.3% 8,050
  36. 36. Uniqueness Analysis Model Uniques (%) Uniques (People) Safe Harbor 0.000064% 13 Chicago Health Atlas 0.3% 8,050 Linked to Voter Registration Safe Harbor Really small 0 Chicago Health Atlas 0.004% 80 Linked to Voter Reg
  37. 37. Uniqueness Analysis Model Uniques (%) Uniques (People) Safe Harbor 0.000064% 13 Chicago Health Atlas 0.3% 8,050 Linked to Voter Registration Safe Harbor Really small 0 Chicago Health Atlas 0.004% 80 Linked to Voter Reg
  38. 38. Next Steps• Consider re-identification risk options – Coarsen ZIP codes – Coarsen Ethnicities – Coarsen Age groups• Search* for tradeoffs between information utility (e.g., epidemiologic findings) and privacy (i.e., re-identification risk) *Benitez & Malin. JAMIA. 2011.
  39. 39. FindingsA promising source of prevalence estimates
  40. 40. Data contribution summary, April 2013 Data Type Institution 1 2 3 4 5 6 Demographics C C C C C PC Diagnoses C C C C C PC Visit type C C C C C PC BMI, BP C PP N N N PC Glucose, HbA1c C C C N N PC Medications C C C N N PCC: complete; N: not yet incorporated;PP: partial time period; PC: partial cohort
  41. 41. How many patients receive care at more than one institution? No. of institutions Number % Cumulative % 4 or 5 393 0.0 0.0 3 8,409 0.9 0.9 2 74,372 7.6 8.5 1 892,468 91.4 100.0Includes the 5 institutions with all patient visits 2006-2010 submitted (as of April 2013).
  42. 42. Sample size/cohort comparison, by residential ZIP code, BRFSS* vs. Chicago Health AtlasSource Min Median Mean MaxIL BRFSS, Chicago2011 respondents 4 15 16 33Chicago HealthAtlas, patient with 1,339 10,031 9,270 21,2892010 visit*CDC Behavioral Risk Factor Surveillance System survey, Chicagosub-sample from Illinois dataset.
  43. 43. Diabetes prevalence estimateby residential ZIPPercent=# of patients with > 1 diabetes mellitus diagnosis code # of patients with visit in 2006-2010
  44. 44. No, patient does not have type 2 diabetesFinding type 2 diabetesin the health record• Diagnosis codes• Labs• Medications• Number of visits Yes, patient has type 2 diabetes
  45. 45. Diabetes prevalence estimateby residential ZIPPercent=# of patients with > 1 diabetes mellitus diagnosis code or lab criteria met # of patients with visit in 2006-2010
  46. 46. Percent of Atlas patients with diabetes diagnosis in 2006-2010 Percent Minimum number of visits recordedIllinois BRFSS estimates the prevalence of diabetes in Chicago at 9-11%.
  47. 47. Hypertension prevalence estimateby residential ZIPPercent=# of patients with > 1 hypertension diagnosis code # of patients with visit in 2006-2010
  48. 48. Coronary heart disease prevalenceestimateby residential ZIPPercent=# of patients with > 1 CHD diagnosis code # of patients with visit in 2006-2010
  49. 49. Gun shot wound prevalenceestimateby residential ZIPPercent=# of patients with > 1 gun shot wound diagnosis code # of patients with visit in 2006-2010
  50. 50. Problem Applying estimates to Chicago– rather than patient – populations
  51. 51. Age distribution comparison, 2010Percent Age groups
  52. 52. Race-ethnicity comparison Percent of total Group Atlas 2010 CensusNon-Hispanic black 31 32Non-Hispanic white 20 32Hispanic 14 29Non-Hispanic Asian 4 5Not given/Unknown 31 0
  53. 53. Geographic coverageby residential ZIP Percent= # of patients with visit in 2010 2010 Census population Additional text
  54. 54. ProblemZIP Codes aren’t meaningful geographic units
  55. 55. Imputation of ZIP code rates to community area Diabetes hospitalization, 2010 Imputed using age, sex,Rates by ZIP Imputed using age & sex & race-ethnicity Additional text
  56. 56. Imputation of ZIP code rates to community area Diabetes hospitalization, 2010 Imputed using age, sex,Rates by ZIP Imputed using age & sex & race-ethnicity Additional text
  57. 57. Maps courtesy of Chieko Maene, University of Chicago, as part of CDPH-UC Diabetes Translational Research Collaboration.
  58. 58. Dasymetric areal interpolation1. Calculate for each ZIP code Male & female x 19 age groups = 28 rates or Male & female x 19 age groups x 4 race-ethnicity groups = 84 rates2. Apply rates to corresponding population group in each census block to get counts3. Sum counts to Community area4. Calculate rates based on community area population denominators
  59. 59. Dataset description elements• Description (who, what, where, when)• Definitions• Calculations and formulas• Limitations, disclaimers, sources of error• Benchmarks and references
  60. 60. Chicago Health Atlas Funders• Otho S.A. Sprague Institute• Northwestern Memorial Hospital Community Engagement
  61. 61. Health Atlas Team• Northwestern University: John Cashy, Anna Roberts, Sara Lake• Univ. of Illinois-Chicago: Bill Galanter, John Lazaro• Cook County Hospital System: Bala Hota, Amanda Grasso• Univ. of Chicago Medical Center: Chris Lyttle, Ben Vekhter, David Meltzer• Alliance of Chicago: Erin Kaleba, Fred Rachman, Jermaine Dellahousaye• Rush University Medical Center: Shannon Sims, Aaron Tabor• Vanderbilt University: Brad Malin• UIC Intern team: Ariadna Garcia, Pravin Babu Karuppaiah, Shazia Sathar, Ulas Keles (Sid Battacharya, Faculty mentor)
  62. 62. facebook.com/ChicagoPublicHealth @ChiPublicHealthHealthyChicago@CityofChicago.org 312.747.9884 CityofChicago.org/Health

×