SlideShare a Scribd company logo
1 of 33
Download to read offline
Scalable architectures for phenotype libraries
Building (inter)national phenotype libraries
S39
Martin Chapman
King’s College London
#AMIA2023
AMIA 2023 Annual Symposium | amia.org 1
Disclosure
I and my spouse/partner have no relevant relationships with commercial interests to disclose.
AMIA 2023 Annual Symposium | amia.org 2
Learning objectives
After participating in this session the learner should be better able to:
• Understand how the software architecture, definition structure and hosting mechanisms
behind a phenotype library affect the accessibility of hosted phenotypes, and thus their
impact.
AMIA 2023 Annual Symposium | amia.org 3
Overview
‘The definitions in a phenotype library can only have an impact if they are accessible at scale.’
What do we mean by ‘accessible at scale’:
1. Can be downloaded from a library successfully by a large number of users (software
architecture)
2. Can be interpreted (and thus implemented) by a range of different users (with knowledge
of a range of different programming languages) working with a range of different datasets
(definition architecture)
3. Can be successfully located by a broad range of users (distribution architecture)
AMIA 2023 Annual Symposium | amia.org 4
Running examples: Phenoflow and OHDSI
Throughout, I will refer to two phenotype libraries, one of our own (Phenoflow), and for a broader
perspective a popular, third-party library developed by OHDSI1
:
Figure 1: Phenoflow phenotype library Figure 2: OHDSI phenotype library overview
1
I am not directly connected to OHDSI or an expert in their tools
AMIA 2023 Annual Symposium | amia.org 5
1. Software architecture
AMIA 2023 Annual Symposium | amia.org 6
Building phenotype libraries
Phenotype libraries are, or use as a part of their
ecosystems, web applications.
As such, we have a choice about how we build
these applications.
If we don’t build the application in a suitable way,
we may fail to actually get phenotype defini-
tions to people when, for example, there is high
demand on a library.
In other words, the library may fail to scale.
AMIA 2023 Annual Symposium | amia.org 7
Research software vs. user software
Why may phenotype libraries not be built in a suitable way?
Phenotype libraries are often built by researchers, who may have experiences, preferences and
goals that differ from those that would lead to ensuring a library is scalable.
For example, researchers are often familiar with and favour the use of languages like Python,
which does not necessarily scale as well as languages like V8-compiled Javascript.
Overall, there is often a tension between research software requirements and the requirements
of software that is suitable for (large numbers of) users.
AMIA 2023 Annual Symposium | amia.org 8
Microservices
To try and balance these requirements we can consider how to structure our software.
Figure 3: Example customer information microservice. Newman, 2019
A microservice design approach suggests that a system should be separated into individual
communicating services (often via HTTP), each of which provides a single piece of overall
system functionality.
AMIA 2023 Annual Symposium | amia.org 9
Impact of microservices
Because of the modularity of a microservice architecture, each service can be built using a
different language, allowing languages to be combined.
Therefore, user-facing components can be built using scalable languages, leaving
researchers to build the remaining components in languages that best suit them.
We refer to this as technological heterogeneity.
We also gain the ability to replicate components (scalability), isolate components with long
execution times in order to ensure the remainder of the system is not affected (resilience) and
replace components with minimal impact to the rest of the system (replaceability).
AMIA 2023 Annual Symposium | amia.org 10
Phenoflow architecture
Web Portal/API
Generator
Visualiser
Implementation
Units
VC server
Author(s)
User
customise
workflow,
visualisation,
implementation units
author,
expand
data
workflow
workflow
visualisation
Figure 4: Phenoflow’s microservice architecture
Martin Chapman, Luke Rasmussen, et al. (2021). “Phenoflow: A Microservice Architecture for Portable Workflow-based Phenotype Definitions”. In: AMIA Joint Summits on Translational Science,
pp. 142–151
AMIA 2023 Annual Symposium | amia.org 11
Phenoflow stack
AMIA 2023 Annual Symposium | amia.org 12
OHDSI architecture
The OHDSI phenotype library lever-
ages software like ATLAS, for which
several key architectural considerations
have been made, particularly the use of
containerisation (Docker) through ini-
tiatives like Broadsea.
OHDSI software like HADES (a set of
R packages for analytics) is also Dock-
erised. https://github.com/OHDSI/Broadsea
AMIA 2023 Annual Symposium | amia.org 13
(Slight detour...) CONSULT I
To show the benefits of a microservice approach when developing research software, we can
briefly consider CONSULT, a decision-support system for stroke patients that was developed
under the paradigm.
Figure 5: CONSULT’s dashboard interface Figure 6: CONSULT’s chatbot interface
AMIA 2023 Annual Symposium | amia.org 14
(Slight detour...) CONSULT II
Blood pressure
(Withings API)
Pulse and Activity
(Garmin API)
Heart Rate / ECG
(Medibiosense API)
EHR
(EMIS)
Device
Integration
(Withings)
Device
Integration
(Garmin)
Device
Integration
(Vitalpatch)
Sensor-FHIR
converter
EHR Integration
(EMIS)
EHR-FHIR
converter
FHIR Health
Data Server
Message
Passer
Dialogue
Manager
Authentication
Server
Provenance
Server
Data
Miner
Argumentation
Engine
Tablet
Browser
Chat
Server
UI backend
PC
Browser
Sensor data
Sensor
data
Sensor
data
EHR
data
FHIR resources
FHIR resources
FHIR resources
Processed
patient data
Patient
data
Substitution
Credentials
Processed
patient data
Processed patient data,
goal
Results
Dialogue responses
Data summaries, tips
Figure 7: CONSULT’s microservice architecture
AMIA 2023 Annual Symposium | amia.org 15
Scalability in practice
CONSULT’s user-facing components are
built using scalable languages, while its
remaining components are built using lan-
guages that are more traditionally found in
research software. Its components can also
be replicated.
We tested the ability for the CONSULT ar-
chitecture (specifically its sensor integration
components) to respond to high load and
obtained positive results.
Martin Chapman, Abigail G-Medhin, et al. (2022). “Using Microservices to Design
Patient-facing Research Software”. In: Proceedings of the IEEE 18th International Con-
ference on e-Science (e-Science), pp. 44–54
monolithic CONSULT
0
20000
40000
60000
80000
100000
120000
140000
Average
responses
Ok
Timeout
Figure 8: How CONSULT responds to high load vs.
an emulated monolithic architecture
AMIA 2023 Annual Symposium | amia.org 16
Microservices in industry
The use of microservices in industrial settings (and in the wider software development
community) is commonplace.
However (at least in our experience) research software does not adopt industry paradigms like
this.
Wider goal: encourage the use of established software engineering techniques when developing
research software.
AMIA 2023 Annual Symposium | amia.org 17
2. Definition architecture
AMIA 2023 Annual Symposium | amia.org 18
Phenotype definition challenges
• Phenotype definitions come in lots of different forms (flowcharts, text descriptions,
weights for a classifier, etc.) and lack standardisation. This reduces intelligibility and thus
phenotypic reproducibility at scale (how broadly the logic intended by the definition author
can be accurately implemented).
• Computable phenotypes often don’t exist at all. This affects phenotypic portability (the
effort associated with implementing a definition is high, limiting its adoption at scale).
AMIA 2023 Annual Symposium | amia.org 19
Phenotype definition challenges
• Phenotype definitions come in lots of different forms (flowcharts, text descriptions,
weights for a classifier, etc.) and lack standardisation. This reduces intelligibility and thus
phenotypic reproducibility at scale (how broadly the logic intended by the definition author
can be accurately implemented).
We need standardised models to structure definitions.
• Computable phenotypes often don’t exist at all. This affects phenotypic portability (the
effort associated with implementing a definition is high, limiting its adoption at scale).
AMIA 2023 Annual Symposium | amia.org 19
Phenotype definition challenges
• Phenotype definitions come in lots of different forms (flowcharts, text descriptions,
weights for a classifier, etc.) and lack standardisation. This reduces intelligibility and thus
phenotypic reproducibility at scale (how broadly the logic intended by the definition author
can be accurately implemented).
We need standardised models to structure definitions.
• Computable phenotypes often don’t exist at all. This affects phenotypic portability (the
effort associated with implementing a definition is high, limiting its adoption at scale).
We need mechanisms for generating and storing computable forms of definitions.
AMIA 2023 Annual Symposium | amia.org 19
Phenoflow’s definition model I
A new Common Workflow Language (CWL)-based model for the definition of a phenotype:
number group id description type
step
Input Output
id description id description extensionA
pathA languageA paramsA
implementationUnitA
Computational
Implementation
Units
pathB languageB paramsB
implementationUnitB
Abstract
Functional
Figure 9: CWL-based definition model (step) and implementation units*.
*the bits of code actually executed by definitions structured under this model; separate from the model itself.
AMIA 2023 Annual Symposium | amia.org 20
Phenoflow’s definition model II
Model is separated into layers:
• Abstract: Expresses the logic of a phenotype through a set of simple sequential, potentially
nested steps, each of which is annotated with multiple descriptions. Emphasis on
intelligibility.
• Functional: Specifies the metadata of entities passed between the operations within the
abstract layer, e.g., the format of an intermediate cohort.
• Computational: Defines an environment for the execution of one or more implementation
units (e.g. a script, data pipeline module, etc.) for each step in the abstract layer. Inherently
supports implementation by providing a template for development in any language.
AMIA 2023 Annual Symposium | amia.org 21
Phenoflow’s definition model III
2 - icd10 A case is identified in the presence of
patients associated with the stated icd10
COVID-19 codes.
logic
step
Input Output
covid19_cohort Potential covid19
cases.
covid19_cases_icd10 covid19 cases, as
identified by icd10
coding.
csv
icd10.py python -
for row in csv_reader :
newRow = row . copy ( )
for c e l l in row :
i f [ value for value in
row [ c e l l ] . s p l i t ( " , " )
i f value in codes ] :
newRow[ " covid19 " ] = "CASE"
...
Computational
Implementation
Units
icd10.js javascript -
for ( row of csvData ) {
newRow = row . s l i c e ( ) ;
for ( c e l l of row ) {
i f ( c e l l . s p l i t ( " , " )
. f i l t e r ( code=>codes .
indexOf ( code) > −1). length ) {
newRow. push ( "CASE" ) ;
...
Abstract
Functional
Figure 10: Individual step of COVID-19 phenotype definition and implementation units.
AMIA 2023 Annual Symposium | amia.org 22
Phenoflow definition parsing
In addition to providing an intelligible model that supports different implementations,
Phenoflow also actively parses definitions from a variety of sources (including the HDR UK
phenotype library) under this model, thus providing pre-made computable forms.
This also solves the ‘it’s only useful if it’s used’ issue often associated with any kind of model.
Web Portal/API
Generator
Visualiser
Implementation
Units
VC server
Author(s)
User
customise
workflow,
visualisation,
implementation units
author,
expand
data
workflow
workflow
visualisation
AMIA 2023 Annual Symposium | amia.org 23
OHDSI OMOP phenotypes (cohorts)
The phenotypes found in the OHDSI phenotype li-
brary also have an expected structure, and thus
exhibit many of the same benefits.
While this structure is tied to the OMOP CDM,
there is work going on around OMOP interoper-
ability.
Figure 11: An OHDSI phenotype definition (cohort)
AMIA 2023 Annual Symposium | amia.org 24
Phenoflow connectors
The Phenoflow model imposes a number of other constraints, including:
• The first step must be of a connector type (currently load or
external), designed to extract data from a datasource without
performing any processing on that data, and pass it to the second
step.
• Other steps in a definition must describe the logic of the phenotype
(types currently boolean logic and generic logic (supporting, for
example, case exclusion)).
OMOP FHIR
Step 2
Step 3
Together, these two elements of the model promote interoperability with a variety of different
data standards (including OMOP itself).
Martin Chapman, Luke V Rasmussen, et al. (2022). “Connecting computable phenotypes with multiple Health IT Standards using the Phenoflow library”. In: AMIA Clinical Informatics Conference
AMIA 2023 Annual Symposium | amia.org 25
3. Distribution architecture
AMIA 2023 Annual Symposium | amia.org 26
Finding definitions
The ability to locate a phenotype definition is also a key part of its accessibility at scale.
When designing a library, we have a choice about where to host it and which existing platforms
and technologies to potentially connect to.
These choices can impact the discoverability of the definitions hosted.
AMIA 2023 Annual Symposium | amia.org 27
Version control systems I
OHDSI’s phenotype library is (in part) hosted
on GitHub, a remote version control system
(VCS).
This neatly provides a mechanism to distribute
phenotypes and ensure they can be located (by
considering the FAIR principles), while at the
same time having important library features
available such as versioning.
Martin Chapman, Shahzad Mumtaz, et al. (2021). “Desiderata for the development of next-
generation phenotype libraries”. In: Gigascience 10.9, pp. 1–13
Figure 12: OHDSI’s phenotype library on GitHub
AMIA 2023 Annual Symposium | amia.org 28
Version control systems II
Inspired by OHDSI’s approach, Phenoflow is being migrated to a VCS (GitHub) backed.
API Generator
Visualiser GitHub
Author(s)
User
query
link to workflow
+ implementation units and
visualisation
author,
expand data
workflow
index
workflows
Figure 13: Phenoflow’s new VCS-backed architecture
In doing so, we aim to lever-
age even more of the features
provided by a VCS, including
the use of branches for dif-
ferent connectors.
Martin Chapman, Luke V Rasmussen, et al. (2023). “Using Version Control Systems to Support High-Quality Phenotype Definitions”. In: AMIA Joint Summits on Translation Science, p. 816
AMIA 2023 Annual Symposium | amia.org 29
Summary
It is important that the definitions we host in phenotype libraries are accessible at scale.
Definitions are accessible at scale if they can be easily located, downloaded and interpreted by
large numbers of users.
We can ensure this is the case by carefully considering how we structure phenotype libraries and
the definitions they contain.
If definitions are accessible they can have an impact. They can, for example, be reused,
ultimately supporting reproducible research.
AMIA 2023 Annual Symposium | amia.org 30
Thank you!
AMIA 2023 Annual Symposium | amia.org 31

More Related Content

Similar to Scalable architectures for phenotype libraries

A COMPOSITE DESIGN PATTERN FOR SERVICE INJECTION AND COMPOSITION OF WEB SERVI...
A COMPOSITE DESIGN PATTERN FOR SERVICE INJECTION AND COMPOSITION OF WEB SERVI...A COMPOSITE DESIGN PATTERN FOR SERVICE INJECTION AND COMPOSITION OF WEB SERVI...
A COMPOSITE DESIGN PATTERN FOR SERVICE INJECTION AND COMPOSITION OF WEB SERVI...ijwscjournal
 
A COMPOSITE DESIGN PATTERN FOR SERVICE INJECTION AND COMPOSITION OF WEB SERVI...
A COMPOSITE DESIGN PATTERN FOR SERVICE INJECTION AND COMPOSITION OF WEB SERVI...A COMPOSITE DESIGN PATTERN FOR SERVICE INJECTION AND COMPOSITION OF WEB SERVI...
A COMPOSITE DESIGN PATTERN FOR SERVICE INJECTION AND COMPOSITION OF WEB SERVI...ijwscjournal
 
A COMPOSITE DESIGN PATTERN FOR SERVICE INJECTION AND COMPOSITION OF WEB SERVI...
A COMPOSITE DESIGN PATTERN FOR SERVICE INJECTION AND COMPOSITION OF WEB SERVI...A COMPOSITE DESIGN PATTERN FOR SERVICE INJECTION AND COMPOSITION OF WEB SERVI...
A COMPOSITE DESIGN PATTERN FOR SERVICE INJECTION AND COMPOSITION OF WEB SERVI...ijwscjournal
 
Simulation Modelling Practice and Theory 47 (2014) 28–45Cont.docx
Simulation Modelling Practice and Theory 47 (2014) 28–45Cont.docxSimulation Modelling Practice and Theory 47 (2014) 28–45Cont.docx
Simulation Modelling Practice and Theory 47 (2014) 28–45Cont.docxedgar6wallace88877
 
An Empirical Study of the Improved SPLD Framework using Expert Opinion Technique
An Empirical Study of the Improved SPLD Framework using Expert Opinion TechniqueAn Empirical Study of the Improved SPLD Framework using Expert Opinion Technique
An Empirical Study of the Improved SPLD Framework using Expert Opinion TechniqueIJEACS
 
STATICMOCK : A Mock Object Framework for Compiled Languages
STATICMOCK : A Mock Object Framework for Compiled Languages STATICMOCK : A Mock Object Framework for Compiled Languages
STATICMOCK : A Mock Object Framework for Compiled Languages ijseajournal
 
Ontological approach to the specification of properties of software systems a...
Ontological approach to the specification of properties of software systems a...Ontological approach to the specification of properties of software systems a...
Ontological approach to the specification of properties of software systems a...Patricia Tavares Boralli
 
Phenoflow: A Microservice Architecture for Portable Workflow-based Phenotype ...
Phenoflow: A Microservice Architecture for Portable Workflow-based Phenotype ...Phenoflow: A Microservice Architecture for Portable Workflow-based Phenotype ...
Phenoflow: A Microservice Architecture for Portable Workflow-based Phenotype ...Martin Chapman
 
IRJET- Deep Web Searching (DWS)
IRJET- Deep Web Searching (DWS)IRJET- Deep Web Searching (DWS)
IRJET- Deep Web Searching (DWS)IRJET Journal
 
The Architecture Of Software Defined Radios Essay
The Architecture Of Software Defined Radios EssayThe Architecture Of Software Defined Radios Essay
The Architecture Of Software Defined Radios EssayDivya Watson
 
Tech challenges in a large scale agile project
Tech challenges in a large scale agile projectTech challenges in a large scale agile project
Tech challenges in a large scale agile projectHarald Soevik
 
Development Tools - Abhijeet
Development Tools - AbhijeetDevelopment Tools - Abhijeet
Development Tools - AbhijeetAbhijeet Kalsi
 
Using Evolutionary Prototypes To Formalize Product Requirements
Using Evolutionary Prototypes To Formalize Product RequirementsUsing Evolutionary Prototypes To Formalize Product Requirements
Using Evolutionary Prototypes To Formalize Product RequirementsArnold Rudorfer
 
Quality Attributes of Web Software
Quality Attributes of Web Software Quality Attributes of Web Software
Quality Attributes of Web Software hasnainqayyum1
 
Concurrency Issues in Object-Oriented Modeling
Concurrency Issues in Object-Oriented ModelingConcurrency Issues in Object-Oriented Modeling
Concurrency Issues in Object-Oriented ModelingIRJET Journal
 
A Generic Open Source Framework for Auto Generation of Data Manipulation Comm...
A Generic Open Source Framework for Auto Generation of Data Manipulation Comm...A Generic Open Source Framework for Auto Generation of Data Manipulation Comm...
A Generic Open Source Framework for Auto Generation of Data Manipulation Comm...iosrjce
 
RCAMM_IEEE_RAICS_2013_6745453
RCAMM_IEEE_RAICS_2013_6745453RCAMM_IEEE_RAICS_2013_6745453
RCAMM_IEEE_RAICS_2013_6745453Shekhar Parkhi
 

Similar to Scalable architectures for phenotype libraries (20)

A COMPOSITE DESIGN PATTERN FOR SERVICE INJECTION AND COMPOSITION OF WEB SERVI...
A COMPOSITE DESIGN PATTERN FOR SERVICE INJECTION AND COMPOSITION OF WEB SERVI...A COMPOSITE DESIGN PATTERN FOR SERVICE INJECTION AND COMPOSITION OF WEB SERVI...
A COMPOSITE DESIGN PATTERN FOR SERVICE INJECTION AND COMPOSITION OF WEB SERVI...
 
A COMPOSITE DESIGN PATTERN FOR SERVICE INJECTION AND COMPOSITION OF WEB SERVI...
A COMPOSITE DESIGN PATTERN FOR SERVICE INJECTION AND COMPOSITION OF WEB SERVI...A COMPOSITE DESIGN PATTERN FOR SERVICE INJECTION AND COMPOSITION OF WEB SERVI...
A COMPOSITE DESIGN PATTERN FOR SERVICE INJECTION AND COMPOSITION OF WEB SERVI...
 
A COMPOSITE DESIGN PATTERN FOR SERVICE INJECTION AND COMPOSITION OF WEB SERVI...
A COMPOSITE DESIGN PATTERN FOR SERVICE INJECTION AND COMPOSITION OF WEB SERVI...A COMPOSITE DESIGN PATTERN FOR SERVICE INJECTION AND COMPOSITION OF WEB SERVI...
A COMPOSITE DESIGN PATTERN FOR SERVICE INJECTION AND COMPOSITION OF WEB SERVI...
 
Simulation Modelling Practice and Theory 47 (2014) 28–45Cont.docx
Simulation Modelling Practice and Theory 47 (2014) 28–45Cont.docxSimulation Modelling Practice and Theory 47 (2014) 28–45Cont.docx
Simulation Modelling Practice and Theory 47 (2014) 28–45Cont.docx
 
An Empirical Study of the Improved SPLD Framework using Expert Opinion Technique
An Empirical Study of the Improved SPLD Framework using Expert Opinion TechniqueAn Empirical Study of the Improved SPLD Framework using Expert Opinion Technique
An Empirical Study of the Improved SPLD Framework using Expert Opinion Technique
 
STATICMOCK : A Mock Object Framework for Compiled Languages
STATICMOCK : A Mock Object Framework for Compiled Languages STATICMOCK : A Mock Object Framework for Compiled Languages
STATICMOCK : A Mock Object Framework for Compiled Languages
 
Ontological approach to the specification of properties of software systems a...
Ontological approach to the specification of properties of software systems a...Ontological approach to the specification of properties of software systems a...
Ontological approach to the specification of properties of software systems a...
 
Phenoflow: A Microservice Architecture for Portable Workflow-based Phenotype ...
Phenoflow: A Microservice Architecture for Portable Workflow-based Phenotype ...Phenoflow: A Microservice Architecture for Portable Workflow-based Phenotype ...
Phenoflow: A Microservice Architecture for Portable Workflow-based Phenotype ...
 
IRJET- Deep Web Searching (DWS)
IRJET- Deep Web Searching (DWS)IRJET- Deep Web Searching (DWS)
IRJET- Deep Web Searching (DWS)
 
Sub1583
Sub1583Sub1583
Sub1583
 
The Architecture Of Software Defined Radios Essay
The Architecture Of Software Defined Radios EssayThe Architecture Of Software Defined Radios Essay
The Architecture Of Software Defined Radios Essay
 
10.1.1.107.2618
10.1.1.107.261810.1.1.107.2618
10.1.1.107.2618
 
Tech challenges in a large scale agile project
Tech challenges in a large scale agile projectTech challenges in a large scale agile project
Tech challenges in a large scale agile project
 
Development Tools - Abhijeet
Development Tools - AbhijeetDevelopment Tools - Abhijeet
Development Tools - Abhijeet
 
Using Evolutionary Prototypes To Formalize Product Requirements
Using Evolutionary Prototypes To Formalize Product RequirementsUsing Evolutionary Prototypes To Formalize Product Requirements
Using Evolutionary Prototypes To Formalize Product Requirements
 
Quality Attributes of Web Software
Quality Attributes of Web Software Quality Attributes of Web Software
Quality Attributes of Web Software
 
Concurrency Issues in Object-Oriented Modeling
Concurrency Issues in Object-Oriented ModelingConcurrency Issues in Object-Oriented Modeling
Concurrency Issues in Object-Oriented Modeling
 
D017372538
D017372538D017372538
D017372538
 
A Generic Open Source Framework for Auto Generation of Data Manipulation Comm...
A Generic Open Source Framework for Auto Generation of Data Manipulation Comm...A Generic Open Source Framework for Auto Generation of Data Manipulation Comm...
A Generic Open Source Framework for Auto Generation of Data Manipulation Comm...
 
RCAMM_IEEE_RAICS_2013_6745453
RCAMM_IEEE_RAICS_2013_6745453RCAMM_IEEE_RAICS_2013_6745453
RCAMM_IEEE_RAICS_2013_6745453
 

More from Martin Chapman

Principles of Health Informatics: Artificial intelligence and machine learning
Principles of Health Informatics: Artificial intelligence and machine learningPrinciples of Health Informatics: Artificial intelligence and machine learning
Principles of Health Informatics: Artificial intelligence and machine learningMartin Chapman
 
Principles of Health Informatics: Clinical decision support systems
Principles of Health Informatics: Clinical decision support systemsPrinciples of Health Informatics: Clinical decision support systems
Principles of Health Informatics: Clinical decision support systemsMartin Chapman
 
Mechanisms for Integrating Real Data into Search Game Simulations: An Applica...
Mechanisms for Integrating Real Data into Search Game Simulations: An Applica...Mechanisms for Integrating Real Data into Search Game Simulations: An Applica...
Mechanisms for Integrating Real Data into Search Game Simulations: An Applica...Martin Chapman
 
Technical Validation through Automated Testing
Technical Validation through Automated TestingTechnical Validation through Automated Testing
Technical Validation through Automated TestingMartin Chapman
 
Using AI to understand how preventative interventions can improve the health ...
Using AI to understand how preventative interventions can improve the health ...Using AI to understand how preventative interventions can improve the health ...
Using AI to understand how preventative interventions can improve the health ...Martin Chapman
 
Using AI to autonomously identify diseases within groups of patients
Using AI to autonomously identify diseases within groups of patientsUsing AI to autonomously identify diseases within groups of patients
Using AI to autonomously identify diseases within groups of patientsMartin Chapman
 
Using AI to understand how preventative interventions can improve the health ...
Using AI to understand how preventative interventions can improve the health ...Using AI to understand how preventative interventions can improve the health ...
Using AI to understand how preventative interventions can improve the health ...Martin Chapman
 
Principles of Health Informatics: Evaluating medical software
Principles of Health Informatics: Evaluating medical softwarePrinciples of Health Informatics: Evaluating medical software
Principles of Health Informatics: Evaluating medical softwareMartin Chapman
 
Principles of Health Informatics: Usability of medical software
Principles of Health Informatics: Usability of medical softwarePrinciples of Health Informatics: Usability of medical software
Principles of Health Informatics: Usability of medical softwareMartin Chapman
 
Principles of Health Informatics: Social networks, telehealth, and mobile health
Principles of Health Informatics: Social networks, telehealth, and mobile healthPrinciples of Health Informatics: Social networks, telehealth, and mobile health
Principles of Health Informatics: Social networks, telehealth, and mobile healthMartin Chapman
 
Principles of Health Informatics: Communication systems in healthcare
Principles of Health Informatics: Communication systems in healthcarePrinciples of Health Informatics: Communication systems in healthcare
Principles of Health Informatics: Communication systems in healthcareMartin Chapman
 
Principles of Health Informatics: Terminologies and classification systems
Principles of Health Informatics: Terminologies and classification systemsPrinciples of Health Informatics: Terminologies and classification systems
Principles of Health Informatics: Terminologies and classification systemsMartin Chapman
 
Principles of Health Informatics: Representing medical knowledge
Principles of Health Informatics: Representing medical knowledgePrinciples of Health Informatics: Representing medical knowledge
Principles of Health Informatics: Representing medical knowledgeMartin Chapman
 
Principles of Health Informatics: Informatics skills - searching and making d...
Principles of Health Informatics: Informatics skills - searching and making d...Principles of Health Informatics: Informatics skills - searching and making d...
Principles of Health Informatics: Informatics skills - searching and making d...Martin Chapman
 
Principles of Health Informatics: Informatics skills - communicating, structu...
Principles of Health Informatics: Informatics skills - communicating, structu...Principles of Health Informatics: Informatics skills - communicating, structu...
Principles of Health Informatics: Informatics skills - communicating, structu...Martin Chapman
 
Principles of Health Informatics: Models, information, and information systems
Principles of Health Informatics: Models, information, and information systemsPrinciples of Health Informatics: Models, information, and information systems
Principles of Health Informatics: Models, information, and information systemsMartin Chapman
 
Using AI to understand how preventative interventions can improve the health ...
Using AI to understand how preventative interventions can improve the health ...Using AI to understand how preventative interventions can improve the health ...
Using AI to understand how preventative interventions can improve the health ...Martin Chapman
 
Using CWL to support EHR-based phenotyping
Using CWL to support EHR-based phenotypingUsing CWL to support EHR-based phenotyping
Using CWL to support EHR-based phenotypingMartin Chapman
 
Phenoflow: An Architecture for Computable Phenotypes
Phenoflow: An Architecture for Computable PhenotypesPhenoflow: An Architecture for Computable Phenotypes
Phenoflow: An Architecture for Computable PhenotypesMartin Chapman
 

More from Martin Chapman (20)

Principles of Health Informatics: Artificial intelligence and machine learning
Principles of Health Informatics: Artificial intelligence and machine learningPrinciples of Health Informatics: Artificial intelligence and machine learning
Principles of Health Informatics: Artificial intelligence and machine learning
 
Principles of Health Informatics: Clinical decision support systems
Principles of Health Informatics: Clinical decision support systemsPrinciples of Health Informatics: Clinical decision support systems
Principles of Health Informatics: Clinical decision support systems
 
Mechanisms for Integrating Real Data into Search Game Simulations: An Applica...
Mechanisms for Integrating Real Data into Search Game Simulations: An Applica...Mechanisms for Integrating Real Data into Search Game Simulations: An Applica...
Mechanisms for Integrating Real Data into Search Game Simulations: An Applica...
 
Technical Validation through Automated Testing
Technical Validation through Automated TestingTechnical Validation through Automated Testing
Technical Validation through Automated Testing
 
Using AI to understand how preventative interventions can improve the health ...
Using AI to understand how preventative interventions can improve the health ...Using AI to understand how preventative interventions can improve the health ...
Using AI to understand how preventative interventions can improve the health ...
 
Using AI to autonomously identify diseases within groups of patients
Using AI to autonomously identify diseases within groups of patientsUsing AI to autonomously identify diseases within groups of patients
Using AI to autonomously identify diseases within groups of patients
 
Using AI to understand how preventative interventions can improve the health ...
Using AI to understand how preventative interventions can improve the health ...Using AI to understand how preventative interventions can improve the health ...
Using AI to understand how preventative interventions can improve the health ...
 
Principles of Health Informatics: Evaluating medical software
Principles of Health Informatics: Evaluating medical softwarePrinciples of Health Informatics: Evaluating medical software
Principles of Health Informatics: Evaluating medical software
 
Principles of Health Informatics: Usability of medical software
Principles of Health Informatics: Usability of medical softwarePrinciples of Health Informatics: Usability of medical software
Principles of Health Informatics: Usability of medical software
 
Principles of Health Informatics: Social networks, telehealth, and mobile health
Principles of Health Informatics: Social networks, telehealth, and mobile healthPrinciples of Health Informatics: Social networks, telehealth, and mobile health
Principles of Health Informatics: Social networks, telehealth, and mobile health
 
Principles of Health Informatics: Communication systems in healthcare
Principles of Health Informatics: Communication systems in healthcarePrinciples of Health Informatics: Communication systems in healthcare
Principles of Health Informatics: Communication systems in healthcare
 
Principles of Health Informatics: Terminologies and classification systems
Principles of Health Informatics: Terminologies and classification systemsPrinciples of Health Informatics: Terminologies and classification systems
Principles of Health Informatics: Terminologies and classification systems
 
Principles of Health Informatics: Representing medical knowledge
Principles of Health Informatics: Representing medical knowledgePrinciples of Health Informatics: Representing medical knowledge
Principles of Health Informatics: Representing medical knowledge
 
Principles of Health Informatics: Informatics skills - searching and making d...
Principles of Health Informatics: Informatics skills - searching and making d...Principles of Health Informatics: Informatics skills - searching and making d...
Principles of Health Informatics: Informatics skills - searching and making d...
 
Principles of Health Informatics: Informatics skills - communicating, structu...
Principles of Health Informatics: Informatics skills - communicating, structu...Principles of Health Informatics: Informatics skills - communicating, structu...
Principles of Health Informatics: Informatics skills - communicating, structu...
 
Principles of Health Informatics: Models, information, and information systems
Principles of Health Informatics: Models, information, and information systemsPrinciples of Health Informatics: Models, information, and information systems
Principles of Health Informatics: Models, information, and information systems
 
Using AI to understand how preventative interventions can improve the health ...
Using AI to understand how preventative interventions can improve the health ...Using AI to understand how preventative interventions can improve the health ...
Using AI to understand how preventative interventions can improve the health ...
 
Using CWL to support EHR-based phenotyping
Using CWL to support EHR-based phenotypingUsing CWL to support EHR-based phenotyping
Using CWL to support EHR-based phenotyping
 
Phenoflow: An Architecture for Computable Phenotypes
Phenoflow: An Architecture for Computable PhenotypesPhenoflow: An Architecture for Computable Phenotypes
Phenoflow: An Architecture for Computable Phenotypes
 
Phenoflow 2021
Phenoflow 2021Phenoflow 2021
Phenoflow 2021
 

Recently uploaded

Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxDr.Ibrahim Hassaan
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxEyham Joco
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
Quarter 4 Peace-education.pptx Catch Up Friday
Quarter 4 Peace-education.pptx Catch Up FridayQuarter 4 Peace-education.pptx Catch Up Friday
Quarter 4 Peace-education.pptx Catch Up FridayMakMakNepo
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Romantic Opera MUSIC FOR GRADE NINE pptx
Romantic Opera MUSIC FOR GRADE NINE pptxRomantic Opera MUSIC FOR GRADE NINE pptx
Romantic Opera MUSIC FOR GRADE NINE pptxsqpmdrvczh
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 

Recently uploaded (20)

9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptx
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptx
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
Quarter 4 Peace-education.pptx Catch Up Friday
Quarter 4 Peace-education.pptx Catch Up FridayQuarter 4 Peace-education.pptx Catch Up Friday
Quarter 4 Peace-education.pptx Catch Up Friday
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Romantic Opera MUSIC FOR GRADE NINE pptx
Romantic Opera MUSIC FOR GRADE NINE pptxRomantic Opera MUSIC FOR GRADE NINE pptx
Romantic Opera MUSIC FOR GRADE NINE pptx
 
Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 

Scalable architectures for phenotype libraries

  • 1. Scalable architectures for phenotype libraries Building (inter)national phenotype libraries S39 Martin Chapman King’s College London #AMIA2023 AMIA 2023 Annual Symposium | amia.org 1
  • 2. Disclosure I and my spouse/partner have no relevant relationships with commercial interests to disclose. AMIA 2023 Annual Symposium | amia.org 2
  • 3. Learning objectives After participating in this session the learner should be better able to: • Understand how the software architecture, definition structure and hosting mechanisms behind a phenotype library affect the accessibility of hosted phenotypes, and thus their impact. AMIA 2023 Annual Symposium | amia.org 3
  • 4. Overview ‘The definitions in a phenotype library can only have an impact if they are accessible at scale.’ What do we mean by ‘accessible at scale’: 1. Can be downloaded from a library successfully by a large number of users (software architecture) 2. Can be interpreted (and thus implemented) by a range of different users (with knowledge of a range of different programming languages) working with a range of different datasets (definition architecture) 3. Can be successfully located by a broad range of users (distribution architecture) AMIA 2023 Annual Symposium | amia.org 4
  • 5. Running examples: Phenoflow and OHDSI Throughout, I will refer to two phenotype libraries, one of our own (Phenoflow), and for a broader perspective a popular, third-party library developed by OHDSI1 : Figure 1: Phenoflow phenotype library Figure 2: OHDSI phenotype library overview 1 I am not directly connected to OHDSI or an expert in their tools AMIA 2023 Annual Symposium | amia.org 5
  • 6. 1. Software architecture AMIA 2023 Annual Symposium | amia.org 6
  • 7. Building phenotype libraries Phenotype libraries are, or use as a part of their ecosystems, web applications. As such, we have a choice about how we build these applications. If we don’t build the application in a suitable way, we may fail to actually get phenotype defini- tions to people when, for example, there is high demand on a library. In other words, the library may fail to scale. AMIA 2023 Annual Symposium | amia.org 7
  • 8. Research software vs. user software Why may phenotype libraries not be built in a suitable way? Phenotype libraries are often built by researchers, who may have experiences, preferences and goals that differ from those that would lead to ensuring a library is scalable. For example, researchers are often familiar with and favour the use of languages like Python, which does not necessarily scale as well as languages like V8-compiled Javascript. Overall, there is often a tension between research software requirements and the requirements of software that is suitable for (large numbers of) users. AMIA 2023 Annual Symposium | amia.org 8
  • 9. Microservices To try and balance these requirements we can consider how to structure our software. Figure 3: Example customer information microservice. Newman, 2019 A microservice design approach suggests that a system should be separated into individual communicating services (often via HTTP), each of which provides a single piece of overall system functionality. AMIA 2023 Annual Symposium | amia.org 9
  • 10. Impact of microservices Because of the modularity of a microservice architecture, each service can be built using a different language, allowing languages to be combined. Therefore, user-facing components can be built using scalable languages, leaving researchers to build the remaining components in languages that best suit them. We refer to this as technological heterogeneity. We also gain the ability to replicate components (scalability), isolate components with long execution times in order to ensure the remainder of the system is not affected (resilience) and replace components with minimal impact to the rest of the system (replaceability). AMIA 2023 Annual Symposium | amia.org 10
  • 11. Phenoflow architecture Web Portal/API Generator Visualiser Implementation Units VC server Author(s) User customise workflow, visualisation, implementation units author, expand data workflow workflow visualisation Figure 4: Phenoflow’s microservice architecture Martin Chapman, Luke Rasmussen, et al. (2021). “Phenoflow: A Microservice Architecture for Portable Workflow-based Phenotype Definitions”. In: AMIA Joint Summits on Translational Science, pp. 142–151 AMIA 2023 Annual Symposium | amia.org 11
  • 12. Phenoflow stack AMIA 2023 Annual Symposium | amia.org 12
  • 13. OHDSI architecture The OHDSI phenotype library lever- ages software like ATLAS, for which several key architectural considerations have been made, particularly the use of containerisation (Docker) through ini- tiatives like Broadsea. OHDSI software like HADES (a set of R packages for analytics) is also Dock- erised. https://github.com/OHDSI/Broadsea AMIA 2023 Annual Symposium | amia.org 13
  • 14. (Slight detour...) CONSULT I To show the benefits of a microservice approach when developing research software, we can briefly consider CONSULT, a decision-support system for stroke patients that was developed under the paradigm. Figure 5: CONSULT’s dashboard interface Figure 6: CONSULT’s chatbot interface AMIA 2023 Annual Symposium | amia.org 14
  • 15. (Slight detour...) CONSULT II Blood pressure (Withings API) Pulse and Activity (Garmin API) Heart Rate / ECG (Medibiosense API) EHR (EMIS) Device Integration (Withings) Device Integration (Garmin) Device Integration (Vitalpatch) Sensor-FHIR converter EHR Integration (EMIS) EHR-FHIR converter FHIR Health Data Server Message Passer Dialogue Manager Authentication Server Provenance Server Data Miner Argumentation Engine Tablet Browser Chat Server UI backend PC Browser Sensor data Sensor data Sensor data EHR data FHIR resources FHIR resources FHIR resources Processed patient data Patient data Substitution Credentials Processed patient data Processed patient data, goal Results Dialogue responses Data summaries, tips Figure 7: CONSULT’s microservice architecture AMIA 2023 Annual Symposium | amia.org 15
  • 16. Scalability in practice CONSULT’s user-facing components are built using scalable languages, while its remaining components are built using lan- guages that are more traditionally found in research software. Its components can also be replicated. We tested the ability for the CONSULT ar- chitecture (specifically its sensor integration components) to respond to high load and obtained positive results. Martin Chapman, Abigail G-Medhin, et al. (2022). “Using Microservices to Design Patient-facing Research Software”. In: Proceedings of the IEEE 18th International Con- ference on e-Science (e-Science), pp. 44–54 monolithic CONSULT 0 20000 40000 60000 80000 100000 120000 140000 Average responses Ok Timeout Figure 8: How CONSULT responds to high load vs. an emulated monolithic architecture AMIA 2023 Annual Symposium | amia.org 16
  • 17. Microservices in industry The use of microservices in industrial settings (and in the wider software development community) is commonplace. However (at least in our experience) research software does not adopt industry paradigms like this. Wider goal: encourage the use of established software engineering techniques when developing research software. AMIA 2023 Annual Symposium | amia.org 17
  • 18. 2. Definition architecture AMIA 2023 Annual Symposium | amia.org 18
  • 19. Phenotype definition challenges • Phenotype definitions come in lots of different forms (flowcharts, text descriptions, weights for a classifier, etc.) and lack standardisation. This reduces intelligibility and thus phenotypic reproducibility at scale (how broadly the logic intended by the definition author can be accurately implemented). • Computable phenotypes often don’t exist at all. This affects phenotypic portability (the effort associated with implementing a definition is high, limiting its adoption at scale). AMIA 2023 Annual Symposium | amia.org 19
  • 20. Phenotype definition challenges • Phenotype definitions come in lots of different forms (flowcharts, text descriptions, weights for a classifier, etc.) and lack standardisation. This reduces intelligibility and thus phenotypic reproducibility at scale (how broadly the logic intended by the definition author can be accurately implemented). We need standardised models to structure definitions. • Computable phenotypes often don’t exist at all. This affects phenotypic portability (the effort associated with implementing a definition is high, limiting its adoption at scale). AMIA 2023 Annual Symposium | amia.org 19
  • 21. Phenotype definition challenges • Phenotype definitions come in lots of different forms (flowcharts, text descriptions, weights for a classifier, etc.) and lack standardisation. This reduces intelligibility and thus phenotypic reproducibility at scale (how broadly the logic intended by the definition author can be accurately implemented). We need standardised models to structure definitions. • Computable phenotypes often don’t exist at all. This affects phenotypic portability (the effort associated with implementing a definition is high, limiting its adoption at scale). We need mechanisms for generating and storing computable forms of definitions. AMIA 2023 Annual Symposium | amia.org 19
  • 22. Phenoflow’s definition model I A new Common Workflow Language (CWL)-based model for the definition of a phenotype: number group id description type step Input Output id description id description extensionA pathA languageA paramsA implementationUnitA Computational Implementation Units pathB languageB paramsB implementationUnitB Abstract Functional Figure 9: CWL-based definition model (step) and implementation units*. *the bits of code actually executed by definitions structured under this model; separate from the model itself. AMIA 2023 Annual Symposium | amia.org 20
  • 23. Phenoflow’s definition model II Model is separated into layers: • Abstract: Expresses the logic of a phenotype through a set of simple sequential, potentially nested steps, each of which is annotated with multiple descriptions. Emphasis on intelligibility. • Functional: Specifies the metadata of entities passed between the operations within the abstract layer, e.g., the format of an intermediate cohort. • Computational: Defines an environment for the execution of one or more implementation units (e.g. a script, data pipeline module, etc.) for each step in the abstract layer. Inherently supports implementation by providing a template for development in any language. AMIA 2023 Annual Symposium | amia.org 21
  • 24. Phenoflow’s definition model III 2 - icd10 A case is identified in the presence of patients associated with the stated icd10 COVID-19 codes. logic step Input Output covid19_cohort Potential covid19 cases. covid19_cases_icd10 covid19 cases, as identified by icd10 coding. csv icd10.py python - for row in csv_reader : newRow = row . copy ( ) for c e l l in row : i f [ value for value in row [ c e l l ] . s p l i t ( " , " ) i f value in codes ] : newRow[ " covid19 " ] = "CASE" ... Computational Implementation Units icd10.js javascript - for ( row of csvData ) { newRow = row . s l i c e ( ) ; for ( c e l l of row ) { i f ( c e l l . s p l i t ( " , " ) . f i l t e r ( code=>codes . indexOf ( code) > −1). length ) { newRow. push ( "CASE" ) ; ... Abstract Functional Figure 10: Individual step of COVID-19 phenotype definition and implementation units. AMIA 2023 Annual Symposium | amia.org 22
  • 25. Phenoflow definition parsing In addition to providing an intelligible model that supports different implementations, Phenoflow also actively parses definitions from a variety of sources (including the HDR UK phenotype library) under this model, thus providing pre-made computable forms. This also solves the ‘it’s only useful if it’s used’ issue often associated with any kind of model. Web Portal/API Generator Visualiser Implementation Units VC server Author(s) User customise workflow, visualisation, implementation units author, expand data workflow workflow visualisation AMIA 2023 Annual Symposium | amia.org 23
  • 26. OHDSI OMOP phenotypes (cohorts) The phenotypes found in the OHDSI phenotype li- brary also have an expected structure, and thus exhibit many of the same benefits. While this structure is tied to the OMOP CDM, there is work going on around OMOP interoper- ability. Figure 11: An OHDSI phenotype definition (cohort) AMIA 2023 Annual Symposium | amia.org 24
  • 27. Phenoflow connectors The Phenoflow model imposes a number of other constraints, including: • The first step must be of a connector type (currently load or external), designed to extract data from a datasource without performing any processing on that data, and pass it to the second step. • Other steps in a definition must describe the logic of the phenotype (types currently boolean logic and generic logic (supporting, for example, case exclusion)). OMOP FHIR Step 2 Step 3 Together, these two elements of the model promote interoperability with a variety of different data standards (including OMOP itself). Martin Chapman, Luke V Rasmussen, et al. (2022). “Connecting computable phenotypes with multiple Health IT Standards using the Phenoflow library”. In: AMIA Clinical Informatics Conference AMIA 2023 Annual Symposium | amia.org 25
  • 28. 3. Distribution architecture AMIA 2023 Annual Symposium | amia.org 26
  • 29. Finding definitions The ability to locate a phenotype definition is also a key part of its accessibility at scale. When designing a library, we have a choice about where to host it and which existing platforms and technologies to potentially connect to. These choices can impact the discoverability of the definitions hosted. AMIA 2023 Annual Symposium | amia.org 27
  • 30. Version control systems I OHDSI’s phenotype library is (in part) hosted on GitHub, a remote version control system (VCS). This neatly provides a mechanism to distribute phenotypes and ensure they can be located (by considering the FAIR principles), while at the same time having important library features available such as versioning. Martin Chapman, Shahzad Mumtaz, et al. (2021). “Desiderata for the development of next- generation phenotype libraries”. In: Gigascience 10.9, pp. 1–13 Figure 12: OHDSI’s phenotype library on GitHub AMIA 2023 Annual Symposium | amia.org 28
  • 31. Version control systems II Inspired by OHDSI’s approach, Phenoflow is being migrated to a VCS (GitHub) backed. API Generator Visualiser GitHub Author(s) User query link to workflow + implementation units and visualisation author, expand data workflow index workflows Figure 13: Phenoflow’s new VCS-backed architecture In doing so, we aim to lever- age even more of the features provided by a VCS, including the use of branches for dif- ferent connectors. Martin Chapman, Luke V Rasmussen, et al. (2023). “Using Version Control Systems to Support High-Quality Phenotype Definitions”. In: AMIA Joint Summits on Translation Science, p. 816 AMIA 2023 Annual Symposium | amia.org 29
  • 32. Summary It is important that the definitions we host in phenotype libraries are accessible at scale. Definitions are accessible at scale if they can be easily located, downloaded and interpreted by large numbers of users. We can ensure this is the case by carefully considering how we structure phenotype libraries and the definitions they contain. If definitions are accessible they can have an impact. They can, for example, be reused, ultimately supporting reproducible research. AMIA 2023 Annual Symposium | amia.org 30
  • 33. Thank you! AMIA 2023 Annual Symposium | amia.org 31