1st Symposium on Big Data and Public Health - 2013

Linking Health Records for
Population Health Research in Brazil.

Cláu...
Record Linkage:
The process of identifying and merging records across
different databases that correspond to the same enti...
Record Linkage:
Record linkage is made relatively easy when a unique
identifier, such as a health insurance number, is
ava...
Record Linkage:
Data pre-processing: data cleaning, standardization of
codes and formats; parsing (name, address).
Indexin...
The Record Linkage Process:
...“For more than a decade, most of the

methodological research has been in the
computer scie...
The record linkage approaches most
frquently used in the Brazilian health
sector :

Probabilistic (Fellig-Sunter Model): u...
Classification model:

Probabilistic

UFRJ
Labmecs

Rule-based
Software:
Febrl

Reclink/OpenRecLink

LinkPlus

Open Source Record Matching

UFRJ
Labmecs
OpenReclink:
Open Source (http://reclink.sourceforge.net/)
Multi-platform;
Multiple language support;
New database back-en...
Accuracy of a probabilistic record linkage strategy applied
to identify deaths among cases reported to the Brazilian
AIDS ...
Results of the
Internal Validation
Study*

Global
Sensitivity (Se) = 87.6%
Specificity (Sp) = 99.6%.

*Fonseca et al, CSP ...
Impact of linkage errors on risk ratios:
In longitudinal mortality studies, linkage errors introduce
outcome misclassifica...
UFRJ
Labmecs

http://www.ihdln.org
Thank you.
Laboratório de Métodos Epidemiológicos, Estatísticos e
Computacionais em Saúde (LABMECS/IESC/UFRJ)

UFRJ
Labmec...
Upcoming SlideShare
Loading in …5
×

Claudia medina: Linking Health Records for Population Health Research in Brazil.

2,088 views

Published on

Talk by Claudia Medina Coeli on the 1st Symposium of Big Data and Public Health, 2013

  • Be the first to comment

Claudia medina: Linking Health Records for Population Health Research in Brazil.

  1. 1. 1st Symposium on Big Data and Public Health - 2013 Linking Health Records for Population Health Research in Brazil. Cláudia Medina Coeli UFRJ Labmecs
  2. 2. Record Linkage: The process of identifying and merging records across different databases that correspond to the same entity (for example, the same individual). This process creates a new database that has more variables than each single database linked. It also can be used to identify records that refer to the same entity within a single database. It is used for deduplication (removal of duplicate records or merging them into a combine record) UFRJ Labmecs
  3. 3. Record Linkage: Record linkage is made relatively easy when a unique identifier, such as a health insurance number, is available in the databases to be linked. In the absence of a unique identifier, the process is based on similar personal identifiers (e.g., name, sex, date of birth, address) Use of techniques that deal with problems such as typographical errors or variations; time-sensitive data (e.g. address); large databases. UFRJ Labmecs
  4. 4. Record Linkage: Data pre-processing: data cleaning, standardization of codes and formats; parsing (name, address). Indexing (Blocking): comparisons are restricted to records that agree on a blocking key (e.g. soundex (first name) + sex). Comparison: approximate comparison functions (partial agreement); vector of numerical similarity. Classification:rule-based, probabilistic, machine learning approaches) Clerical review: manual inspection (tedious and labour-intensive) Christen P, 2012 UFRJ Labmecs Evaluation: accuracy studies
  5. 5. The Record Linkage Process: ...“For more than a decade, most of the methodological research has been in the computer science literature”... …“Many applications are still in the epidemiological or health informatics literature with most individuals using government health agency shareware based on the Fellig-Sunter model”... William E Winkler, 2012. UFRJ Labmecs
  6. 6. The record linkage approaches most frquently used in the Brazilian health sector : Probabilistic (Fellig-Sunter Model): uses approximate comparison functions. Different weights are assigned to each field based on their discriminant power and vulnerability to error. A number of commercial and open source softwares are available. Deterministic: uses exact comparison functions and rule-based classification approach. Rules are developed based on expert knowledge. Specific computer routines need to be developed for each problem. UFRJ Labmecs
  7. 7. Classification model: Probabilistic UFRJ Labmecs Rule-based
  8. 8. Software: Febrl Reclink/OpenRecLink LinkPlus Open Source Record Matching UFRJ Labmecs
  9. 9. OpenReclink: Open Source (http://reclink.sourceforge.net/) Multi-platform; Multiple language support; New database back-end; PostgreSQL integration; New deduplication routine Better performance (Linux Ubuntu 64 bits). UFRJ Labmecs
  10. 10. Accuracy of a probabilistic record linkage strategy applied to identify deaths among cases reported to the Brazilian AIDS surveillance database*. Study Population: All AIDS cases reported in SINAN with date of diagnosis between 2002 and 2005 Imperfect gold standard: Known death - case with a date of death informed in the surveillance database (N = 19,750). Known alive - no date of death informed in the surveillance database and found registered in the laboratory database in 2006 (N = 36,675). Linkage Gold Standard Dead Alive Total Dead 17301 2449 19750 Alive 155 38520 38675 Global Sensitivity (Se) = 87.6% Specificity (Sp) = 99.6%. UFRJ Labmecs *Fonseca et al, CSP 26(7), 2010.
  11. 11. Results of the Internal Validation Study* Global Sensitivity (Se) = 87.6% Specificity (Sp) = 99.6%. *Fonseca et al, CSP 26(7), 2010. UFRJ Labmecs
  12. 12. Impact of linkage errors on risk ratios: In longitudinal mortality studies, linkage errors introduce outcome misclassification, making risk ratio estimates prone to bias. Risk ratios will not be biased if all three conditions hold: (1) exposure and outcome misclassification errors must be independent; (2) the outcome misclassification must be non-differential with regard to the exposure levels. (3) specificity must be 100%. UFRJ Labmecs
  13. 13. UFRJ Labmecs http://www.ihdln.org
  14. 14. Thank you. Laboratório de Métodos Epidemiológicos, Estatísticos e Computacionais em Saúde (LABMECS/IESC/UFRJ) UFRJ Labmecs http://www.iesc.ufrj.br/posgrad/posgraduacao/ coeli@iesc.ufrj.br

×