SlideShare a Scribd company logo
1 of 14
1st Symposium on Big Data and Public Health - 2013

Linking Health Records for
Population Health Research in Brazil.

Cláudia Medina Coeli

UFRJ
Labmecs
Record Linkage:
The process of identifying and merging records across
different databases that correspond to the same entity
(for example, the same individual).
This process creates a new database that has more
variables than each single database linked.
It also can be used to identify records that refer to the
same entity within a single database. It is used for
deduplication (removal of duplicate records or
merging them into a combine record)

UFRJ
Labmecs
Record Linkage:
Record linkage is made relatively easy when a unique
identifier, such as a health insurance number, is
available in the databases to be linked.
In the absence of a unique identifier, the process is
based on similar personal identifiers (e.g., name, sex,
date of birth, address)
Use of techniques that deal with problems such as
typographical errors or variations; time-sensitive data
(e.g. address); large databases.

UFRJ
Labmecs
Record Linkage:
Data pre-processing: data cleaning, standardization of
codes and formats; parsing (name, address).
Indexing (Blocking): comparisons are restricted to
records that agree on a blocking key (e.g. soundex (first
name) + sex).
Comparison: approximate comparison
functions (partial agreement); vector of
numerical similarity.
Classification:rule-based, probabilistic,
machine learning approaches)
Clerical review: manual inspection
(tedious and labour-intensive)
Christen P, 2012

UFRJ
Labmecs

Evaluation: accuracy studies
The Record Linkage Process:
...“For more than a decade, most of the

methodological research has been in the
computer science literature”...
…“Many applications are still in the
epidemiological
or
health
informatics
literature
with most individuals using
government
health agency shareware
based on the Fellig-Sunter model”...

William E Winkler, 2012.

UFRJ
Labmecs
The record linkage approaches most
frquently used in the Brazilian health
sector :

Probabilistic (Fellig-Sunter Model): uses approximate
comparison functions. Different weights are assigned
to each field based on their discriminant power and
vulnerability to error. A number of commercial and
open source softwares are available.
Deterministic: uses exact comparison functions and
rule-based classification approach. Rules are
developed based on expert knowledge. Specific
computer routines need to be developed for each
problem.
UFRJ
Labmecs
Classification model:

Probabilistic

UFRJ
Labmecs

Rule-based
Software:
Febrl

Reclink/OpenRecLink

LinkPlus

Open Source Record Matching

UFRJ
Labmecs
OpenReclink:
Open Source (http://reclink.sourceforge.net/)
Multi-platform;
Multiple language support;
New database back-end;
PostgreSQL integration;
New deduplication routine
Better performance (Linux Ubuntu 64 bits).

UFRJ
Labmecs
Accuracy of a probabilistic record linkage strategy applied
to identify deaths among cases reported to the Brazilian
AIDS surveillance database*.
Study Population:
All AIDS cases reported in SINAN with date of diagnosis between 2002 and 2005
Imperfect gold standard:
Known death - case with a date of death informed in the surveillance database (N
= 19,750).
Known alive - no date of death informed in the surveillance database and found
registered in the laboratory database in 2006 (N = 36,675).

Linkage
Gold Standard
Dead

Alive

Total

Dead

17301

2449

19750

Alive

155

38520

38675

Global
Sensitivity (Se) = 87.6%
Specificity (Sp) = 99.6%.

UFRJ
Labmecs

*Fonseca et al, CSP 26(7), 2010.
Results of the
Internal Validation
Study*

Global
Sensitivity (Se) = 87.6%
Specificity (Sp) = 99.6%.

*Fonseca et al, CSP 26(7), 2010.

UFRJ
Labmecs
Impact of linkage errors on risk ratios:
In longitudinal mortality studies, linkage errors introduce
outcome misclassification, making risk ratio estimates prone to
bias.
Risk ratios will not be biased if all three conditions hold:
(1) exposure and outcome misclassification errors must be
independent;
(2) the outcome misclassification must be non-differential with
regard to the exposure levels.
(3) specificity must be 100%.

UFRJ
Labmecs
UFRJ
Labmecs

http://www.ihdln.org
Thank you.
Laboratório de Métodos Epidemiológicos, Estatísticos e
Computacionais em Saúde (LABMECS/IESC/UFRJ)

UFRJ
Labmecs
http://www.iesc.ufrj.br/posgrad/posgraduacao/

coeli@iesc.ufrj.br

More Related Content

What's hot

2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)Michael Atkins
 
Model Organism Linked Data
Model Organism Linked DataModel Organism Linked Data
Model Organism Linked DataMichel Dumontier
 
Basics of Data Analysis in Bioinformatics
Basics of Data Analysis in BioinformaticsBasics of Data Analysis in Bioinformatics
Basics of Data Analysis in BioinformaticsElena Sügis
 
API-Centric Data Integration for Human Genomics Reference Databases: Achieve...
 API-Centric Data Integration for Human Genomics Reference Databases: Achieve... API-Centric Data Integration for Human Genomics Reference Databases: Achieve...
API-Centric Data Integration for Human Genomics Reference Databases: Achieve...Genomika Diagnósticos
 
Career oppurtunities in the field of Bioinformatics
Career oppurtunities in the field of BioinformaticsCareer oppurtunities in the field of Bioinformatics
Career oppurtunities in the field of BioinformaticsShikha Thakur
 
NetBioSIG2012 chrisevelo
NetBioSIG2012 chriseveloNetBioSIG2012 chrisevelo
NetBioSIG2012 chriseveloAlexander Pico
 
The National Center for Biotechnology Information (NCBI) Pathogen Analysis Pi...
The National Center for Biotechnology Information (NCBI) Pathogen Analysis Pi...The National Center for Biotechnology Information (NCBI) Pathogen Analysis Pi...
The National Center for Biotechnology Information (NCBI) Pathogen Analysis Pi...ExternalEvents
 
LECTURE NOTES ON BIOINFORMATICS
LECTURE NOTES ON BIOINFORMATICSLECTURE NOTES ON BIOINFORMATICS
LECTURE NOTES ON BIOINFORMATICSMSCW Mysore
 
Emerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomicsEmerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomicsmikaelhuss
 
Bioinformatics-General_Intro
Bioinformatics-General_IntroBioinformatics-General_Intro
Bioinformatics-General_IntroAbhiroop Ghatak
 
Bioinformatic, and tools by kk sahu
Bioinformatic, and tools by kk sahuBioinformatic, and tools by kk sahu
Bioinformatic, and tools by kk sahuKAUSHAL SAHU
 
dkNET Webinar: "The Microphysiology Systems Database (MPS-Db): A Platform For...
dkNET Webinar: "The Microphysiology Systems Database (MPS-Db): A Platform For...dkNET Webinar: "The Microphysiology Systems Database (MPS-Db): A Platform For...
dkNET Webinar: "The Microphysiology Systems Database (MPS-Db): A Platform For...dkNET
 
Data sharing - Data management - The SysMO-SEEK Story
Data sharing - Data management - The SysMO-SEEK StoryData sharing - Data management - The SysMO-SEEK Story
Data sharing - Data management - The SysMO-SEEK StoryResearch Information Network
 
databases in bioinformatics
databases in bioinformaticsdatabases in bioinformatics
databases in bioinformaticsnadeem akhter
 

What's hot (20)

2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)
 
Model Organism Linked Data
Model Organism Linked DataModel Organism Linked Data
Model Organism Linked Data
 
Basics of Data Analysis in Bioinformatics
Basics of Data Analysis in BioinformaticsBasics of Data Analysis in Bioinformatics
Basics of Data Analysis in Bioinformatics
 
API-Centric Data Integration for Human Genomics Reference Databases: Achieve...
 API-Centric Data Integration for Human Genomics Reference Databases: Achieve... API-Centric Data Integration for Human Genomics Reference Databases: Achieve...
API-Centric Data Integration for Human Genomics Reference Databases: Achieve...
 
Career oppurtunities in the field of Bioinformatics
Career oppurtunities in the field of BioinformaticsCareer oppurtunities in the field of Bioinformatics
Career oppurtunities in the field of Bioinformatics
 
NetBioSIG2012 chrisevelo
NetBioSIG2012 chriseveloNetBioSIG2012 chrisevelo
NetBioSIG2012 chrisevelo
 
iOmics
iOmicsiOmics
iOmics
 
The National Center for Biotechnology Information (NCBI) Pathogen Analysis Pi...
The National Center for Biotechnology Information (NCBI) Pathogen Analysis Pi...The National Center for Biotechnology Information (NCBI) Pathogen Analysis Pi...
The National Center for Biotechnology Information (NCBI) Pathogen Analysis Pi...
 
LECTURE NOTES ON BIOINFORMATICS
LECTURE NOTES ON BIOINFORMATICSLECTURE NOTES ON BIOINFORMATICS
LECTURE NOTES ON BIOINFORMATICS
 
Emerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomicsEmerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomics
 
Bioinformatics ppt
Bioinformatics pptBioinformatics ppt
Bioinformatics ppt
 
Bioinformatics-General_Intro
Bioinformatics-General_IntroBioinformatics-General_Intro
Bioinformatics-General_Intro
 
Data retrieval tools
Data retrieval toolsData retrieval tools
Data retrieval tools
 
Bioinformatic, and tools by kk sahu
Bioinformatic, and tools by kk sahuBioinformatic, and tools by kk sahu
Bioinformatic, and tools by kk sahu
 
dkNET Webinar: "The Microphysiology Systems Database (MPS-Db): A Platform For...
dkNET Webinar: "The Microphysiology Systems Database (MPS-Db): A Platform For...dkNET Webinar: "The Microphysiology Systems Database (MPS-Db): A Platform For...
dkNET Webinar: "The Microphysiology Systems Database (MPS-Db): A Platform For...
 
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
 
Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...
Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...
Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...
 
Data sharing - Data management - The SysMO-SEEK Story
Data sharing - Data management - The SysMO-SEEK StoryData sharing - Data management - The SysMO-SEEK Story
Data sharing - Data management - The SysMO-SEEK Story
 
Bioinformatics on internet
Bioinformatics on internetBioinformatics on internet
Bioinformatics on internet
 
databases in bioinformatics
databases in bioinformaticsdatabases in bioinformatics
databases in bioinformatics
 

Similar to Claudia medina: Linking Health Records for Population Health Research in Brazil.

Charleston Conference 2016
Charleston Conference 2016Charleston Conference 2016
Charleston Conference 2016Anita de Waard
 
Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Ian Foster
 
EUSFLAT 2019: explainable neuro fuzzy recurrent neural network to predict col...
EUSFLAT 2019: explainable neuro fuzzy recurrent neural network to predict col...EUSFLAT 2019: explainable neuro fuzzy recurrent neural network to predict col...
EUSFLAT 2019: explainable neuro fuzzy recurrent neural network to predict col...Servio Fernando Lima Reina
 
Branch: An interactive, web-based tool for building decision tree classifiers
Branch: An interactive, web-based tool for building decision tree classifiersBranch: An interactive, web-based tool for building decision tree classifiers
Branch: An interactive, web-based tool for building decision tree classifiersBenjamin Good
 
2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europe2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europeopen_phacts
 
Jax bio dataworldcongress.ngs.20181128finalwithoutbu
Jax bio dataworldcongress.ngs.20181128finalwithoutbuJax bio dataworldcongress.ngs.20181128finalwithoutbu
Jax bio dataworldcongress.ngs.20181128finalwithoutbuAnne Deslattes Mays
 
NY Prostate Cancer Conference - P.A. Fearn - Session 1: Data management for p...
NY Prostate Cancer Conference - P.A. Fearn - Session 1: Data management for p...NY Prostate Cancer Conference - P.A. Fearn - Session 1: Data management for p...
NY Prostate Cancer Conference - P.A. Fearn - Session 1: Data management for p...European School of Oncology
 
Data Mining and Big Data Analytics in Pharma
Data Mining and Big Data Analytics in Pharma Data Mining and Big Data Analytics in Pharma
Data Mining and Big Data Analytics in Pharma Ankur Khanna
 
Microsoft genomics to advance clinical science
Microsoft genomics to advance clinical scienceMicrosoft genomics to advance clinical science
Microsoft genomics to advance clinical scienceBruno Denys
 
GenomeTrakr: Perspectives on linking internationally - Canada and IRIDA.ca
GenomeTrakr: Perspectives on linking internationally - Canada and IRIDA.caGenomeTrakr: Perspectives on linking internationally - Canada and IRIDA.ca
GenomeTrakr: Perspectives on linking internationally - Canada and IRIDA.cafionabrinkman
 
Machine learning in biology
Machine learning in biologyMachine learning in biology
Machine learning in biologyPranavathiyani G
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...Robert Grossman
 
404 Part II • Predictive AnalyticsMachine LearningAnother.docx
404 Part II • Predictive AnalyticsMachine LearningAnother.docx404 Part II • Predictive AnalyticsMachine LearningAnother.docx
404 Part II • Predictive AnalyticsMachine LearningAnother.docxdomenicacullison
 
Big data and machine learning: opportunità per la medicina di precisione e i ...
Big data and machine learning: opportunità per la medicina di precisione e i ...Big data and machine learning: opportunità per la medicina di precisione e i ...
Big data and machine learning: opportunità per la medicina di precisione e i ...Fondazione Giannino Bassetti
 
Stephen Friend Dana Farber Cancer Institute 2011-10-24
Stephen Friend Dana Farber Cancer Institute 2011-10-24Stephen Friend Dana Farber Cancer Institute 2011-10-24
Stephen Friend Dana Farber Cancer Institute 2011-10-24Sage Base
 
Systems Immunology -- 2014
Systems Immunology -- 2014Systems Immunology -- 2014
Systems Immunology -- 2014Yannick Pouliot
 
Molecular data mining tool advances in hiv
Molecular data mining tool  advances in hivMolecular data mining tool  advances in hiv
Molecular data mining tool advances in hivSalford Systems
 
Systems biology for medical students/Systems medicine
Systems biology for medical students/Systems medicineSystems biology for medical students/Systems medicine
Systems biology for medical students/Systems medicineimprovemed
 
Research Statement Chien-Wei Lin
Research Statement Chien-Wei LinResearch Statement Chien-Wei Lin
Research Statement Chien-Wei LinChien-Wei Lin
 

Similar to Claudia medina: Linking Health Records for Population Health Research in Brazil. (20)

Charleston Conference 2016
Charleston Conference 2016Charleston Conference 2016
Charleston Conference 2016
 
Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009
 
EUSFLAT 2019: explainable neuro fuzzy recurrent neural network to predict col...
EUSFLAT 2019: explainable neuro fuzzy recurrent neural network to predict col...EUSFLAT 2019: explainable neuro fuzzy recurrent neural network to predict col...
EUSFLAT 2019: explainable neuro fuzzy recurrent neural network to predict col...
 
Branch: An interactive, web-based tool for building decision tree classifiers
Branch: An interactive, web-based tool for building decision tree classifiersBranch: An interactive, web-based tool for building decision tree classifiers
Branch: An interactive, web-based tool for building decision tree classifiers
 
2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europe2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europe
 
Jax bio dataworldcongress.ngs.20181128finalwithoutbu
Jax bio dataworldcongress.ngs.20181128finalwithoutbuJax bio dataworldcongress.ngs.20181128finalwithoutbu
Jax bio dataworldcongress.ngs.20181128finalwithoutbu
 
NY Prostate Cancer Conference - P.A. Fearn - Session 1: Data management for p...
NY Prostate Cancer Conference - P.A. Fearn - Session 1: Data management for p...NY Prostate Cancer Conference - P.A. Fearn - Session 1: Data management for p...
NY Prostate Cancer Conference - P.A. Fearn - Session 1: Data management for p...
 
Data Mining and Big Data Analytics in Pharma
Data Mining and Big Data Analytics in Pharma Data Mining and Big Data Analytics in Pharma
Data Mining and Big Data Analytics in Pharma
 
Microsoft genomics to advance clinical science
Microsoft genomics to advance clinical scienceMicrosoft genomics to advance clinical science
Microsoft genomics to advance clinical science
 
GenomeTrakr: Perspectives on linking internationally - Canada and IRIDA.ca
GenomeTrakr: Perspectives on linking internationally - Canada and IRIDA.caGenomeTrakr: Perspectives on linking internationally - Canada and IRIDA.ca
GenomeTrakr: Perspectives on linking internationally - Canada and IRIDA.ca
 
Machine learning in biology
Machine learning in biologyMachine learning in biology
Machine learning in biology
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
 
404 Part II • Predictive AnalyticsMachine LearningAnother.docx
404 Part II • Predictive AnalyticsMachine LearningAnother.docx404 Part II • Predictive AnalyticsMachine LearningAnother.docx
404 Part II • Predictive AnalyticsMachine LearningAnother.docx
 
C0344023028
C0344023028C0344023028
C0344023028
 
Big data and machine learning: opportunità per la medicina di precisione e i ...
Big data and machine learning: opportunità per la medicina di precisione e i ...Big data and machine learning: opportunità per la medicina di precisione e i ...
Big data and machine learning: opportunità per la medicina di precisione e i ...
 
Stephen Friend Dana Farber Cancer Institute 2011-10-24
Stephen Friend Dana Farber Cancer Institute 2011-10-24Stephen Friend Dana Farber Cancer Institute 2011-10-24
Stephen Friend Dana Farber Cancer Institute 2011-10-24
 
Systems Immunology -- 2014
Systems Immunology -- 2014Systems Immunology -- 2014
Systems Immunology -- 2014
 
Molecular data mining tool advances in hiv
Molecular data mining tool  advances in hivMolecular data mining tool  advances in hiv
Molecular data mining tool advances in hiv
 
Systems biology for medical students/Systems medicine
Systems biology for medical students/Systems medicineSystems biology for medical students/Systems medicine
Systems biology for medical students/Systems medicine
 
Research Statement Chien-Wei Lin
Research Statement Chien-Wei LinResearch Statement Chien-Wei Lin
Research Statement Chien-Wei Lin
 

More from Flávio Codeço Coelho

Sistema de Alerta de Dengue Utilizando Dados Hbridos de Redes Sociais, Moni...
Sistema de Alerta de Dengue Utilizando Dados Hbridos de Redes Sociais, Moni...Sistema de Alerta de Dengue Utilizando Dados Hbridos de Redes Sociais, Moni...
Sistema de Alerta de Dengue Utilizando Dados Hbridos de Redes Sociais, Moni...Flávio Codeço Coelho
 
Alerta dengue: Sistema de alertas de surtos usando dados híbridos
Alerta dengue: Sistema de alertas de surtos usando dados híbridosAlerta dengue: Sistema de alertas de surtos usando dados híbridos
Alerta dengue: Sistema de alertas de surtos usando dados híbridosFlávio Codeço Coelho
 
Mauricio barreto:Big data: how can it help to expand epidemiological investig...
Mauricio barreto:Big data: how can it help to expand epidemiological investig...Mauricio barreto:Big data: how can it help to expand epidemiological investig...
Mauricio barreto:Big data: how can it help to expand epidemiological investig...Flávio Codeço Coelho
 
Gabriela gomes: Mathematical Modeling and Data Needs
Gabriela gomes: Mathematical Modeling and Data NeedsGabriela gomes: Mathematical Modeling and Data Needs
Gabriela gomes: Mathematical Modeling and Data NeedsFlávio Codeço Coelho
 
Carl koppeschaar: Disease Radar: Measuring and Forecasting the Spread of Infe...
Carl koppeschaar: Disease Radar: Measuring and Forecasting the Spread of Infe...Carl koppeschaar: Disease Radar: Measuring and Forecasting the Spread of Infe...
Carl koppeschaar: Disease Radar: Measuring and Forecasting the Spread of Infe...Flávio Codeço Coelho
 
Gabriel laporta: Biodiversity can help prevent malaria outbreaks in tropical ...
Gabriel laporta: Biodiversity can help prevent malaria outbreaks in tropical ...Gabriel laporta: Biodiversity can help prevent malaria outbreaks in tropical ...
Gabriel laporta: Biodiversity can help prevent malaria outbreaks in tropical ...Flávio Codeço Coelho
 
Sander van noort: Influenzanet: self-reporting of influenza-like illness in c...
Sander van noort: Influenzanet: self-reporting of influenza-like illness in c...Sander van noort: Influenzanet: self-reporting of influenza-like illness in c...
Sander van noort: Influenzanet: self-reporting of influenza-like illness in c...Flávio Codeço Coelho
 
Mark smolinski big data and public health
Mark smolinski   big data and public healthMark smolinski   big data and public health
Mark smolinski big data and public healthFlávio Codeço Coelho
 
Haroldo lopes datasus - Informações em Saúde: história, uso e desafios
Haroldo lopes   datasus - Informações em Saúde: história, uso e desafiosHaroldo lopes   datasus - Informações em Saúde: história, uso e desafios
Haroldo lopes datasus - Informações em Saúde: história, uso e desafiosFlávio Codeço Coelho
 
Marco Andreazzi: IBGE research and data collection on health related issues.
Marco Andreazzi: IBGE research and data collection on health related issues.Marco Andreazzi: IBGE research and data collection on health related issues.
Marco Andreazzi: IBGE research and data collection on health related issues.Flávio Codeço Coelho
 
Access to Information, privacy, and health research in Brazil
Access to Information, privacy, and health research in BrazilAccess to Information, privacy, and health research in Brazil
Access to Information, privacy, and health research in BrazilFlávio Codeço Coelho
 

More from Flávio Codeço Coelho (19)

Big dengue
Big dengueBig dengue
Big dengue
 
Alerta_Dengue simplified english
Alerta_Dengue simplified englishAlerta_Dengue simplified english
Alerta_Dengue simplified english
 
dengueARS0
dengueARS0dengueARS0
dengueARS0
 
Alerta dengue expo epi out2014
Alerta dengue expo epi out2014Alerta dengue expo epi out2014
Alerta dengue expo epi out2014
 
Alerta dengue abrasco 2014
Alerta dengue   abrasco 2014Alerta dengue   abrasco 2014
Alerta dengue abrasco 2014
 
Sistema de Alerta de Dengue Utilizando Dados Hbridos de Redes Sociais, Moni...
Sistema de Alerta de Dengue Utilizando Dados Hbridos de Redes Sociais, Moni...Sistema de Alerta de Dengue Utilizando Dados Hbridos de Redes Sociais, Moni...
Sistema de Alerta de Dengue Utilizando Dados Hbridos de Redes Sociais, Moni...
 
Alerta dengue: Sistema de alertas de surtos usando dados híbridos
Alerta dengue: Sistema de alertas de surtos usando dados híbridosAlerta dengue: Sistema de alertas de surtos usando dados híbridos
Alerta dengue: Sistema de alertas de surtos usando dados híbridos
 
Mauricio barreto:Big data: how can it help to expand epidemiological investig...
Mauricio barreto:Big data: how can it help to expand epidemiological investig...Mauricio barreto:Big data: how can it help to expand epidemiological investig...
Mauricio barreto:Big data: how can it help to expand epidemiological investig...
 
Gabriela gomes: Mathematical Modeling and Data Needs
Gabriela gomes: Mathematical Modeling and Data NeedsGabriela gomes: Mathematical Modeling and Data Needs
Gabriela gomes: Mathematical Modeling and Data Needs
 
Carl koppeschaar: Disease Radar: Measuring and Forecasting the Spread of Infe...
Carl koppeschaar: Disease Radar: Measuring and Forecasting the Spread of Infe...Carl koppeschaar: Disease Radar: Measuring and Forecasting the Spread of Infe...
Carl koppeschaar: Disease Radar: Measuring and Forecasting the Spread of Infe...
 
Gabriel laporta: Biodiversity can help prevent malaria outbreaks in tropical ...
Gabriel laporta: Biodiversity can help prevent malaria outbreaks in tropical ...Gabriel laporta: Biodiversity can help prevent malaria outbreaks in tropical ...
Gabriel laporta: Biodiversity can help prevent malaria outbreaks in tropical ...
 
Sander van noort: Influenzanet: self-reporting of influenza-like illness in c...
Sander van noort: Influenzanet: self-reporting of influenza-like illness in c...Sander van noort: Influenzanet: self-reporting of influenza-like illness in c...
Sander van noort: Influenzanet: self-reporting of influenza-like illness in c...
 
Mark smolinski big data and public health
Mark smolinski   big data and public healthMark smolinski   big data and public health
Mark smolinski big data and public health
 
Haroldo lopes datasus - Informações em Saúde: história, uso e desafios
Haroldo lopes   datasus - Informações em Saúde: história, uso e desafiosHaroldo lopes   datasus - Informações em Saúde: história, uso e desafios
Haroldo lopes datasus - Informações em Saúde: história, uso e desafios
 
Marco Andreazzi: IBGE research and data collection on health related issues.
Marco Andreazzi: IBGE research and data collection on health related issues.Marco Andreazzi: IBGE research and data collection on health related issues.
Marco Andreazzi: IBGE research and data collection on health related issues.
 
Access to Information, privacy, and health research in Brazil
Access to Information, privacy, and health research in BrazilAccess to Information, privacy, and health research in Brazil
Access to Information, privacy, and health research in Brazil
 
Mining legal texts with Python
Mining legal texts with PythonMining legal texts with Python
Mining legal texts with Python
 
Causal Bayesian Networks
Causal Bayesian NetworksCausal Bayesian Networks
Causal Bayesian Networks
 
In trodução ao Epigrass
In trodução ao EpigrassIn trodução ao Epigrass
In trodução ao Epigrass
 

Recently uploaded

Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfUmakantAnnand
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 

Recently uploaded (20)

Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its Characteristics
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.Compdf
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 

Claudia medina: Linking Health Records for Population Health Research in Brazil.

  • 1. 1st Symposium on Big Data and Public Health - 2013 Linking Health Records for Population Health Research in Brazil. Cláudia Medina Coeli UFRJ Labmecs
  • 2. Record Linkage: The process of identifying and merging records across different databases that correspond to the same entity (for example, the same individual). This process creates a new database that has more variables than each single database linked. It also can be used to identify records that refer to the same entity within a single database. It is used for deduplication (removal of duplicate records or merging them into a combine record) UFRJ Labmecs
  • 3. Record Linkage: Record linkage is made relatively easy when a unique identifier, such as a health insurance number, is available in the databases to be linked. In the absence of a unique identifier, the process is based on similar personal identifiers (e.g., name, sex, date of birth, address) Use of techniques that deal with problems such as typographical errors or variations; time-sensitive data (e.g. address); large databases. UFRJ Labmecs
  • 4. Record Linkage: Data pre-processing: data cleaning, standardization of codes and formats; parsing (name, address). Indexing (Blocking): comparisons are restricted to records that agree on a blocking key (e.g. soundex (first name) + sex). Comparison: approximate comparison functions (partial agreement); vector of numerical similarity. Classification:rule-based, probabilistic, machine learning approaches) Clerical review: manual inspection (tedious and labour-intensive) Christen P, 2012 UFRJ Labmecs Evaluation: accuracy studies
  • 5. The Record Linkage Process: ...“For more than a decade, most of the methodological research has been in the computer science literature”... …“Many applications are still in the epidemiological or health informatics literature with most individuals using government health agency shareware based on the Fellig-Sunter model”... William E Winkler, 2012. UFRJ Labmecs
  • 6. The record linkage approaches most frquently used in the Brazilian health sector : Probabilistic (Fellig-Sunter Model): uses approximate comparison functions. Different weights are assigned to each field based on their discriminant power and vulnerability to error. A number of commercial and open source softwares are available. Deterministic: uses exact comparison functions and rule-based classification approach. Rules are developed based on expert knowledge. Specific computer routines need to be developed for each problem. UFRJ Labmecs
  • 9. OpenReclink: Open Source (http://reclink.sourceforge.net/) Multi-platform; Multiple language support; New database back-end; PostgreSQL integration; New deduplication routine Better performance (Linux Ubuntu 64 bits). UFRJ Labmecs
  • 10. Accuracy of a probabilistic record linkage strategy applied to identify deaths among cases reported to the Brazilian AIDS surveillance database*. Study Population: All AIDS cases reported in SINAN with date of diagnosis between 2002 and 2005 Imperfect gold standard: Known death - case with a date of death informed in the surveillance database (N = 19,750). Known alive - no date of death informed in the surveillance database and found registered in the laboratory database in 2006 (N = 36,675). Linkage Gold Standard Dead Alive Total Dead 17301 2449 19750 Alive 155 38520 38675 Global Sensitivity (Se) = 87.6% Specificity (Sp) = 99.6%. UFRJ Labmecs *Fonseca et al, CSP 26(7), 2010.
  • 11. Results of the Internal Validation Study* Global Sensitivity (Se) = 87.6% Specificity (Sp) = 99.6%. *Fonseca et al, CSP 26(7), 2010. UFRJ Labmecs
  • 12. Impact of linkage errors on risk ratios: In longitudinal mortality studies, linkage errors introduce outcome misclassification, making risk ratio estimates prone to bias. Risk ratios will not be biased if all three conditions hold: (1) exposure and outcome misclassification errors must be independent; (2) the outcome misclassification must be non-differential with regard to the exposure levels. (3) specificity must be 100%. UFRJ Labmecs
  • 14. Thank you. Laboratório de Métodos Epidemiológicos, Estatísticos e Computacionais em Saúde (LABMECS/IESC/UFRJ) UFRJ Labmecs http://www.iesc.ufrj.br/posgrad/posgraduacao/ coeli@iesc.ufrj.br