Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Claudia medina: Linking Health Records for Population Health Research in Brazil.
1. 1st Symposium on Big Data and Public Health - 2013
Linking Health Records for
Population Health Research in Brazil.
Cláudia Medina Coeli
UFRJ
Labmecs
2. Record Linkage:
The process of identifying and merging records across
different databases that correspond to the same entity
(for example, the same individual).
This process creates a new database that has more
variables than each single database linked.
It also can be used to identify records that refer to the
same entity within a single database. It is used for
deduplication (removal of duplicate records or
merging them into a combine record)
UFRJ
Labmecs
3. Record Linkage:
Record linkage is made relatively easy when a unique
identifier, such as a health insurance number, is
available in the databases to be linked.
In the absence of a unique identifier, the process is
based on similar personal identifiers (e.g., name, sex,
date of birth, address)
Use of techniques that deal with problems such as
typographical errors or variations; time-sensitive data
(e.g. address); large databases.
UFRJ
Labmecs
4. Record Linkage:
Data pre-processing: data cleaning, standardization of
codes and formats; parsing (name, address).
Indexing (Blocking): comparisons are restricted to
records that agree on a blocking key (e.g. soundex (first
name) + sex).
Comparison: approximate comparison
functions (partial agreement); vector of
numerical similarity.
Classification:rule-based, probabilistic,
machine learning approaches)
Clerical review: manual inspection
(tedious and labour-intensive)
Christen P, 2012
UFRJ
Labmecs
Evaluation: accuracy studies
5. The Record Linkage Process:
...“For more than a decade, most of the
methodological research has been in the
computer science literature”...
…“Many applications are still in the
epidemiological
or
health
informatics
literature
with most individuals using
government
health agency shareware
based on the Fellig-Sunter model”...
William E Winkler, 2012.
UFRJ
Labmecs
6. The record linkage approaches most
frquently used in the Brazilian health
sector :
Probabilistic (Fellig-Sunter Model): uses approximate
comparison functions. Different weights are assigned
to each field based on their discriminant power and
vulnerability to error. A number of commercial and
open source softwares are available.
Deterministic: uses exact comparison functions and
rule-based classification approach. Rules are
developed based on expert knowledge. Specific
computer routines need to be developed for each
problem.
UFRJ
Labmecs
10. Accuracy of a probabilistic record linkage strategy applied
to identify deaths among cases reported to the Brazilian
AIDS surveillance database*.
Study Population:
All AIDS cases reported in SINAN with date of diagnosis between 2002 and 2005
Imperfect gold standard:
Known death - case with a date of death informed in the surveillance database (N
= 19,750).
Known alive - no date of death informed in the surveillance database and found
registered in the laboratory database in 2006 (N = 36,675).
Linkage
Gold Standard
Dead
Alive
Total
Dead
17301
2449
19750
Alive
155
38520
38675
Global
Sensitivity (Se) = 87.6%
Specificity (Sp) = 99.6%.
UFRJ
Labmecs
*Fonseca et al, CSP 26(7), 2010.
11. Results of the
Internal Validation
Study*
Global
Sensitivity (Se) = 87.6%
Specificity (Sp) = 99.6%.
*Fonseca et al, CSP 26(7), 2010.
UFRJ
Labmecs
12. Impact of linkage errors on risk ratios:
In longitudinal mortality studies, linkage errors introduce
outcome misclassification, making risk ratio estimates prone to
bias.
Risk ratios will not be biased if all three conditions hold:
(1) exposure and outcome misclassification errors must be
independent;
(2) the outcome misclassification must be non-differential with
regard to the exposure levels.
(3) specificity must be 100%.
UFRJ
Labmecs