Your SlideShare is downloading. ×
Claudia medina: Linking Health Records for Population Health Research in Brazil.
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.


Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

Claudia medina: Linking Health Records for Population Health Research in Brazil.


Published on

Talk by Claudia Medina Coeli on the 1st Symposium of Big Data and Public Health, 2013

Talk by Claudia Medina Coeli on the 1st Symposium of Big Data and Public Health, 2013

  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. 1st Symposium on Big Data and Public Health - 2013 Linking Health Records for Population Health Research in Brazil. Cláudia Medina Coeli UFRJ Labmecs
  • 2. Record Linkage: The process of identifying and merging records across different databases that correspond to the same entity (for example, the same individual). This process creates a new database that has more variables than each single database linked. It also can be used to identify records that refer to the same entity within a single database. It is used for deduplication (removal of duplicate records or merging them into a combine record) UFRJ Labmecs
  • 3. Record Linkage: Record linkage is made relatively easy when a unique identifier, such as a health insurance number, is available in the databases to be linked. In the absence of a unique identifier, the process is based on similar personal identifiers (e.g., name, sex, date of birth, address) Use of techniques that deal with problems such as typographical errors or variations; time-sensitive data (e.g. address); large databases. UFRJ Labmecs
  • 4. Record Linkage: Data pre-processing: data cleaning, standardization of codes and formats; parsing (name, address). Indexing (Blocking): comparisons are restricted to records that agree on a blocking key (e.g. soundex (first name) + sex). Comparison: approximate comparison functions (partial agreement); vector of numerical similarity. Classification:rule-based, probabilistic, machine learning approaches) Clerical review: manual inspection (tedious and labour-intensive) Christen P, 2012 UFRJ Labmecs Evaluation: accuracy studies
  • 5. The Record Linkage Process: ...“For more than a decade, most of the methodological research has been in the computer science literature”... …“Many applications are still in the epidemiological or health informatics literature with most individuals using government health agency shareware based on the Fellig-Sunter model”... William E Winkler, 2012. UFRJ Labmecs
  • 6. The record linkage approaches most frquently used in the Brazilian health sector : Probabilistic (Fellig-Sunter Model): uses approximate comparison functions. Different weights are assigned to each field based on their discriminant power and vulnerability to error. A number of commercial and open source softwares are available. Deterministic: uses exact comparison functions and rule-based classification approach. Rules are developed based on expert knowledge. Specific computer routines need to be developed for each problem. UFRJ Labmecs
  • 7. Classification model: Probabilistic UFRJ Labmecs Rule-based
  • 8. Software: Febrl Reclink/OpenRecLink LinkPlus Open Source Record Matching UFRJ Labmecs
  • 9. OpenReclink: Open Source ( Multi-platform; Multiple language support; New database back-end; PostgreSQL integration; New deduplication routine Better performance (Linux Ubuntu 64 bits). UFRJ Labmecs
  • 10. Accuracy of a probabilistic record linkage strategy applied to identify deaths among cases reported to the Brazilian AIDS surveillance database*. Study Population: All AIDS cases reported in SINAN with date of diagnosis between 2002 and 2005 Imperfect gold standard: Known death - case with a date of death informed in the surveillance database (N = 19,750). Known alive - no date of death informed in the surveillance database and found registered in the laboratory database in 2006 (N = 36,675). Linkage Gold Standard Dead Alive Total Dead 17301 2449 19750 Alive 155 38520 38675 Global Sensitivity (Se) = 87.6% Specificity (Sp) = 99.6%. UFRJ Labmecs *Fonseca et al, CSP 26(7), 2010.
  • 11. Results of the Internal Validation Study* Global Sensitivity (Se) = 87.6% Specificity (Sp) = 99.6%. *Fonseca et al, CSP 26(7), 2010. UFRJ Labmecs
  • 12. Impact of linkage errors on risk ratios: In longitudinal mortality studies, linkage errors introduce outcome misclassification, making risk ratio estimates prone to bias. Risk ratios will not be biased if all three conditions hold: (1) exposure and outcome misclassification errors must be independent; (2) the outcome misclassification must be non-differential with regard to the exposure levels. (3) specificity must be 100%. UFRJ Labmecs
  • 13. UFRJ Labmecs
  • 14. Thank you. Laboratório de Métodos Epidemiológicos, Estatísticos e Computacionais em Saúde (LABMECS/IESC/UFRJ) UFRJ Labmecs