In this presentation we explain the role of archetypes for facilitating data reuse and data quality evaluation. Our use case will be a system for monitoring best practices during babies first 1,000 days. What happens during the first 1,000 days is the foundation of an optimum health, growth, and neurodevelopment across the lifespan. This presentation shows how to create a standardized and data quality assessed integrated data depository for a reliable data reuse in monitoring of best practices and research of perinatal health.
Data reuse and quality evaluation in archetype-based environments
1. Data reuse and quality evaluation
in archetype-based environments
2nd Arctic Conference on OpenEHR and Archetype-
based clinical Information Systems
David Moner
damoca@veratech.es
18-19 January 2018, Tromsø
2. About this presentation
• Our use case will be a system for monitoring
best practices during babies first 1,000 days
2
• We want to focus on the role
of archetypes for facilitating
data reuse and data quality
evaluation
Data reuse and quality evaluation in archetype-based environments
3. The first 1,000 days
3
Image from www.first1000days.ie
Data reuse and quality evaluation in archetype-based environments
4. The first 1,000 days
4
Health aspects to be documented Health problems and risks of newborn
Breastfeeding and food introduction
Pregnancy and delivery
Data reuse and quality evaluation in archetype-based environments
What happens during the first 1,000 days is the
foundation of an optimum health, growth, and
neurodevelopment across the lifespan
5. Project description
• Purpose
– To create a standardized and data quality assessed
integrated data depository for a reliable data reuse in
monitoring of best practices and research
5
Pilot project to improve quality
of perinatal information and care (2015)
Data reuse and quality evaluation in archetype-based environments
6. Project description
• What have we done?
6
Define standardized archetypes from gestation to two years
old (1,000 days).
Normalize existing data according to the archetypes, and
import it into an integrated data repository.
Define data quality assessment criteria and evaluate the
quality of the integrated data.
Define Best Practices Indicators (BPIs) for the monitoring of
maternal and child care based on the data structure of the
archetypes.
Data reuse and quality evaluation in archetype-based environments
7. Project description
7
DB
DB
IDR
Perinatal care
Perinatal care
(neonatal info. only)
Set of maternal
and child care
BPIs
ISO 13606
Archetypes
Data Quality
dimensions and
methods
Data Quality
reports
Definition of archetypes and standardization
of data
Data Quality assessment
Monitoring of BPIs
POST-stnd.PRE-stnd.
Data Quality
assessment7 dimensions
DB
Infant-feeding
(primary care)
Data extraction
Data transformation
and standardizationMapping
Hospital
Virgen del Castillo
Hospital
12 de Octubre
Archetype-
based
query
Best Practices Indicators
Data reuse and quality evaluation in archetype-based environments
9. Archetype definition
• Domain
– First 1,000 days of the baby and pregnancy and delivery of the mother
• Reference model
– ISO 13606:2008
• Team
– 6 health professionals (3 perinatal experts)
– 2 experts in archetypes and information standards
• Experiences in the literature
– Jasmin Buck, et al. Towards a comprehensive electronic patient record to
support an innovative individual care concept for premature infants using the
openEHR approach, Int J of Med Inf, Volume 78, Issue 8, 2009, Pages 521-531
– Christina Pahl, et al. Role of OpenEHR as an open source solution for the
regional modelling of patient data in obstetrics, J of Biomed Inf, Volume 55,
2015, Pages 174-187
9Data reuse and quality evaluation in archetype-based environments
10. Archetype definition
• Most archetypes were new implementations
• Some were reused from the Spanish national EHR
– Demographic archetypes, medications, problems, lab results…
• Some were reused from openEHR CKM, and
reimplemented in ISO 13606
– openEHR-EHR-OBSERVATION.apgar.v1
– openEHR-EHR-EVALUATION.health_risk.v1
10Data reuse and quality evaluation in archetype-based environments
11. Archetype definition
• COMPOSITION
– Pregnancy and birth report
– Newborn breastfeeding report
– Food introduction report
• SECTION
– 28 sections, mostly defined inside the Compositions
• ENTRY
– 44 archetypes
11
Archetypes available at: http://mm.linkehr.com/
Data reuse and quality evaluation in archetype-based environments
13. Archetype definition
• Regarding terminologies, the work was limited to
the harmonization of local terms and their
mapping to the archetype list of terms
– Eg. Type of anesthesia
• Terminology bindings were (once again) a victim
of time contraints and lack of terminology
experts
13Data reuse and quality evaluation in archetype-based environments
14. Data collection and normalization
• Data from two hospitals were normalized and
integrated
– Hospital Virgen del Castillo, Murcia
– Hospital 12 Octubre, Madrid
• Over 270 different data items were extracted
from the original databases
– Data was provided as plain XML or CSV by the
informatics service of each hospital
• 7,672 XML ISO 13606 instances (one per child)
were generated and stored in the repository
14Data reuse and quality evaluation in archetype-based environments
15. • Data was transformed into compliant ISO
13606 XML extracts using LinkEHR Studio
15Data reuse and quality evaluation in archetype-based environments
Data collection and normalization
XQuery
16. Data repository
• The data repository was implemented using
eXistdb
– Focus on fast prototyping, not performance
• Configuration:
– One collection per composition type
– Indexes over all paths (by defaulf in eXistdb), all
archetype_id nodes, and object names
16Data reuse and quality evaluation in archetype-based environments
17. Data quality analysis
• High data quality
– It correctly represents the real-world construct to
which it refers
– It fit for its intended uses
• Poor data quality has a serious impact on the
reuse of data for clinical trials, research, public
health, health policy development, etc.
17Data reuse and quality evaluation in archetype-based environments
18. Data quality analysis
• We have developed qualize to evaluate the
quality of aggregated data
• It is an online service that helps in the
evaluation of biomedical data quality
– Automating as much as possible the evaluation
process
– Offering a quantifiable data quality score
18
www.qualize.net
Data reuse and quality evaluation in archetype-based environments
19. Data quality analysis
• qualize evaluates seven dimensions of data
quality:
– Uniqueness. Are there replicated data?
– Completeness. Are there missing data?
– Correction. Are there unexpectedly anomalous
registries?
– Consistency. Do my data comply with stablished rules?
Formats, ranges ...
– Temporal stability. Is there variability in my data over
time?
– Multi-source stability. Is there variability in my data
depending on their origin?
– Predictive value. Can I build decision support systems
from my data?
19Data reuse and quality evaluation in archetype-based environments
21. Data quality analysis
• Completeness and consistency are based on
archetype constraints
– We use those constraint to generate evaluation
rules in Schematron
– The evaluation of those rules over existing data
instances provide a quality score for each
dimension
21Data reuse and quality evaluation in archetype-based environments
22. Data quality analysis
• Completeness: test the existence, or not, of
each attribute
– FPI: Formal Public Identifier (rule identifier)
– Each type of rule is afterwards weighed
• Eg. A void optional element is not as important for the
total completeness score as a void mandatory one
22Data reuse and quality evaluation in archetype-based environments
23. • Consistency: check archetype constraint,
including terminology subsets
23Data reuse and quality evaluation in archetype-based environments
Data quality analysis
24. Data quality analysis
24
Hospital
Dataset
DQ dimension
Hospital 12 de Octubre Hospital Virgen del Castillo
Perinatal health Perinatal health Infant feeding
PRE-stnd.
n=1949
POST-
stnd.
n=1948
PRE-stnd.
n=3781
POST-
stnd.
n=3776
PRE-stnd.
n=2133
POST-
stnd.
n=2133
Uniqueness
Non-replicated identifiers
100% 100% 100% 100% 100% 100%
Completeness
Non-missing data, weighting
obligatory and optative elements
76.71% 8.44% 56.60% 18.03% 98.73% 29.65%
Consistency
Conformance to basic schema rules
- 100% - 100% - 100%
Temporal stability
Data concordance over time
2 1 3 3 1 1
Multi-source stability
Data concordance among different
sources (1-GPD metric)
0.08
(among hospitals)
0.79
(among professionals)
Correctness
Possibly anomalous records
3.39% 0.62% 1.38% 0.45% 0.09% 0.19%
Predictive value
Usefulness of data to predict
breastfeeding at one month
Not applicable 0.60
Data reuse and quality evaluation in archetype-based environments
25. Data quality analysis
Hospital Virgen del Castillo, infant feeding dataset
25
Stability of 0.79 among
primary care professionals
(1- Global Probabilistic
Deviation metric)
Data reuse and quality evaluation in archetype-based environments
26. Best practices indicators
• We defined 127 best practice indicators based on the archetypes
– Compiled from national and international recommendations: Euro-
Peristat Network, WHO, UNICEF, etc.
• Grouped in seven main categories:
– Central indicators
– Maternal history
– Obstetric conditions
– Obstetric environment
– Obstetric interventions
– Baby health status
– Breastfeeding and infant-feeding
• Each best practice indicator includes:
– Rationale
– Readable definition and operational definition
– Location of variables in the archetypes
26Data reuse and quality evaluation in archetype-based environments
27. 27Data reuse and quality evaluation in archetype-based environments
Best practices indicators
28. 28Data reuse and quality evaluation in archetype-based environments
Best practices indicators
INDICADOR 6.14
% babies with exclusive
breastfeeding at the age of one
month
Numerator Numerator filter
Number of babies with an exclusive
breastfeeding
(Rec24h_Lactancia = 1) and (Rec24h_LecheFórmula
= 2) and (Rec24h_Líquidos = 2) and
(Rec24h_Sólidos = 2) and (EI_CerealesSG > 1) and
(EI_CerealesCG > 1) and (EI_Fórmula > 1) and
(EI_Frutas > 1) and (EI_Huevo > 1) and
(EI_LecheVaca > 1) and (EI_Legumbres > 1) and
(EI_Líquidos > 1) and (EI_Pescado > 1) and (EI_Pollo
> 1) and (EI_Verduras > 1) and (EI_Yema > 1) and
(EI_Yogur > 1) and (Tipo_Lactancia = 1)
Denominator Denominator filter
Total number of babies at the age of one
month
(DATEDIFF ( m , "Fecha_Nacimiento" ,
"Fecha_TomaDatos" )) = 1
29. Best practices indicators
• Six indicators were monitored in this pilot for
both hospitals, implemented in Xquery
29Data reuse and quality evaluation in archetype-based environments
31. Improvement of protocols and information systems
31
PREVIOUS VERSION
Gestational age was recorded in two
fields (weeks and days)
NEW VERSION
Gestational age is recorded automatically based
on last period date, expected delivery date, and
current date
Data reuse and quality evaluation in archetype-based environments
32. Results
• Change in healthcare protocols for births
• Improvement of perinatal information systems
• Increased rates of breastfeeding and its
duration
• Reduced use of antibiotics in children
• Leading pilot project towards a Spanish data
quality-assured repository of maternal-child
information
32Data reuse and quality evaluation in archetype-based environments
33. Results
• Number of registers
• Technology stack
– Persistence eXist, could be replaced by Marand or
EHRserver
33Data reuse and quality evaluation in archetype-based environments
34. Lessons learned
• Dealing with the difference between health
data of the mother and her children.
• Limited terminology bindings
– As usual, the semantic definition of archetypes is
always the first victim of time constraints
34Data reuse and quality evaluation in archetype-based environments
35. What’s next?
• Graph databases?
• Add subjective items
• How well do you feel with the received
– Quality of patient experience
35Data reuse and quality evaluation in archetype-based environments