Phenoflow: An Architecture for Computable Phenotypes

Phenoflow: An Architecture for Computable Phenotypes
Applying FAIR principles to computable phenotype libraries - ELIXIR All Hands
Martin Chapman
Tuesday 7 June, 2022
King’s College London
Overview
The story so far...
Phenotype models
Phenotype tooling
Applications
The future
1
The story so far...
HDR UK Phenotype Library
Having a standard structure at the definition level goes a long way towards having FAIR
phenotypes (e.g. reusable and interoperable).
2
What about implementation?
SELECT UserID, COUNT(DISTINCT AbnormalLab) AS abnormal lab
FROM Patients
GROUP BY UserID
HAVING abnormal lab > 0;
...
We should not, however, ignore the computable forms of these definitions—which either exist
already or can be provided—as they have a significant impact on portability (how easy it is to
implement a definition).
However, storing these implementations alongside our definitions risks compromising their
FAIRness:
3
(un)FAIR computable phenotypes i
It can be quite difficult to determine a suitable connection between a definition and its
computable form. As a result, the two are often conflated, reducing intelligibility and thus
reusability.
4
(un)FAIR computable phenotypes ii
Can take many different forms, making them hard to represent in a consistent (and thus
intelligible) way.
5
(un)FAIR computable phenotypes iii
Often have a complex, idiomatic structure, also reducing intelligibility.
6
(un)FAIR computable phenotypes iv
https://data.ohdsi.org/PhenotypeLibrary
Tied to a single standard that is not universally adopted, e.g. OHDSI’s gold standard
phenotype library and the OMOP CDM. This reduces interoperability.
7
Phenotype models
Phenotype models i
Phenotype models govern the information required for, and the structure of, phenotype
definitions.
• They may, for example, govern the logical connectives available to a definition author
when producing a definition (e.g. conjunction and disjunction).
Would argue that the HDR UK phenotype library achieves FAIR principles by having an
inherent model, which all stored definitions must adhere to.
A more explicit model can help us to also store computable phenotypes in a FAIR way.
8
Phenotype model requirements
We need an explicit model that:
1. Connects, yet keeps distinct, a phenotype definition and its computable form.
1.1 A definition needs to remain suitably abstract while making provision for an associated
computable counterpart.
9
Phenotype model requirements
We need an explicit model that:
1. Connects, yet keeps distinct, a phenotype definition and its computable form.
1.1 A definition needs to remain suitably abstract while making provision for an associated
computable counterpart.
2. Accommodates (and potentially combines) different computable forms (one-to-many
connectivity).
9
Phenotype model requirements
We need an explicit model that:
1. Connects, yet keeps distinct, a phenotype definition and its computable form.
1.1 A definition needs to remain suitably abstract while making provision for an associated
computable counterpart.
2. Accommodates (and potentially combines) different computable forms (one-to-many
connectivity).
3. Retains clarity, even in the presence of a complex computable form.
9
Phenotype model requirements
We need an explicit model that:
1. Connects, yet keeps distinct, a phenotype definition and its computable form.
1.1 A definition needs to remain suitably abstract while making provision for an associated
computable counterpart.
2. Accommodates (and potentially combines) different computable forms (one-to-many
connectivity).
3. Retains clarity, even in the presence of a complex computable form.
4. Supports a variety of target data formats to enable sharing.
9
Phenoflow workflow-based model i
The Phenoflow workflow-based model structures definitions as a set of sequential steps, which
effectively transition a population of patients to a cohort that exhibit the condition captured.
We often describe the model as a non-formal
subset of the Common Workflow Language
(CWL), where properties such as the ontology
used and the step types available are fixed.
10
Phenoflow workflow-based model ii
Each step in the model consists of three layers:
• Abstract: Expresses the logic of that step. Says nothing about implementation.
• Functional: Specifies the inputs to, and outputs from, this step (metadata) e.g., the
format of an intermediate cohort.
• Computational: Defines an environment for the execution of one or more
implementation units (e.g. a script, data pipeline module, etc.).
11
Phenoflow workflow-based model iii
number group id description type
step
Input Output
id description id description extensionA
pathA languageA paramsA
implementationUnitA
Computational
Implementation
Units
pathB languageB paramsB
implementationUnitB
Abstract
Functional
Figure 1: Structured phenotype definition model (step) and implementation units.
12
(1) Separate, yet connect, a phenotype definition with its computable form
The separation of the model into logic and implementation layers provides the required
connectivity with a computable form, without affecting abstraction:
2 - icd10 A case is identified in the presence of pa-
tients associated with the stated icd10
COVID-19 codes.
logic
step
Input Output
covid19 cohort Potential covid19
cases.
covid19 cases icd10 covid19 cases, as
identified by icd10
coding.
csv
icd10.py python -
for row in c s v r e a d e r :
newRow = row . copy ()
for c e l l in row :
i f [ value for value in
row [ c e l l ] . s p l i t ( ” , ” )
i f value in codes ] :
newRow [ ” covid19 ” ] = ”CASE”
...
Computational
Implementation
Units
icd10.js javascript -
for ( row of csvData ){
newRow = row . s l i c e ( ) ;
for ( c e l l of row ){
i f ( c e l l . s p l i t ( ” , ” )
. f i l t e r ( code=>codes .
indexOf ( code )>−1). length ){
newRow . push ( ”CASE” ) ;
...
Abstract
Functional
Figure 2: Individual step of COVID-19 code-based Phenoflow definition and implementation units.
13
(2) Accommodate (and potentially combine) all definitions types i
The generality of the model allows it to capture information relating to a wide range of
different definition types.
Similarly, the separation of logic into steps, with clear inputs and outputs, makes each step
self-contained, allowing types to be mixed within a single definition.
• One step may identify patients based on a list of codes, while a subsequent step may
describe the use of more complex NLP techniques in order to identify patients.
14
(2) Accommodate (and potentially combine) all definitions types ii
1 case assignment rx t2dm med-
abnormal lab
A case is identified in the presence of
an abnormal lab value (defined as one
of three different exacerbations in blood
sugar level) AND if medication for this
type of diabetes has been prescribed.
boolean
step
Input Output
dm cohort-
abnormal lab
Potential t2dm
cases, with ab-
normal lab results
identified
dm cases-
case assignment 1
t2dm cases, as iden-
tified by first case as-
signment rule
csv
rx t2dm med-abnormal lab.knwf knime -File= Computational
Implementation
Units
rx t2dm med-abnormal lab.py python -
...
i f row [ ” t1dm dx cnt ” ] == ”0”
and row [ ” t2dm dx cnt ” ] == ”0”
and ” t2dm rx dts ” in c s v r e a d e r . fieldnames
and ” abnormal lab ” in c s v r e a d e r . fieldnames
and row [ ” abnormal lab ” ] == ”1” :
row [ ”t2dm” ] = ”CASE” ;
...
Abstract
Functional
Figure 3: Individual step of T2DM logic-based Phenoflow definition and implementation units.
15
(3) Prioritise clarity to combat the complexity of implementations
On top of definitions now having an expected structure, separation into steps provides a logical
flow.
Each step provides three descriptions of the functionality it contains, to aid clarity:
1. A single ID, providing an overview of the step’s functionality.
2. A longer description of the functionality contained within the step.
3. A classification of the step under a pre-defined ontology, so that even if the ID and
description are not sufficient, a general understanding of the functionality of the step can
still be extracted1
.
Inputs and outputs to each step provide further information.
1As of now we still use simple classification, e.g. logic, but finer a granularity of types is forthcoming.
16
(4) Support a variety of target data formats
We add additional contraints to the workflow-based structure to dictate that the first step in a
definition is always a data read (and the last is always a cohort output).
Because of the modularity of the model structure, we are able to swap in and out the logic,
and associated implementation, of the data read step – while the other steps remain
unchanged – in order to accommodate multiple data formats.
More on this connector approach shortly.
17
Phenotype tooling
Phenoflow web architecture
The Phenoflow model is complemented by a web architecture that allows us to realise these
benefits.
Chapman, Martin, et al. “Phenoflow: A microservice architecture for portable workflow-based
phenotype definitions.” AMIA, 2021. 18
Parse i
“A model is only useful if it’s used”.
The primary way in which we connect definitions with computable forms is by parsing them
from existing sources.
19
Parse ii
Grab data relating to a definition (codelist, drug list, keywords, more complex logic).
Determine where key information is within this
data (e.g. a ‘conceptid’ column in a codelist CSV).
Populate the information required for the Phenoflow model au-
tomatically (e.g. step structure – easier for something like a
codelist; may require some intervention with more complex logic).
Automatically generate implementation units for each step.
Add these to the Phenoflow library ready for download.
20
Parsing example—HDR UK national phenomics resource i
Have imported, standardised and provided implementations for ∼300 existing definitions as
a part of the HDR phenomics resource:
Phenoflow
Concept Library
(Web Interface +
API access)
CALIBER
Gateway
User
Bulk import API access
Workflow link Workflow link
Dataset link
Phenotype link
Dataset link
Phenotype Link
(API access)
Library
Platform
21
Parsing example—HDR UK national phenomics resource ii
These are now directly linked to from the HDR UK Phenotype Portal:
22
Author
Phenotypes can also be authored directly, by submitting specifications to Phenoflow’s API.
• Here, multiple implementations can be provided across the steps, and for each step,
either differing in language (e.g. python vs. javascript) or in form (e.g. a code list vs. a
trained classifier).
23
View
Parsed or authored definitions can be viewed under the model (including the
multi-dimensional descriptions) within the library.
24
Download
Pick a connector to be the first step in the workflow, depending on the format of the dataset
you are targeting (e.g., OMOP, i2b2, FHIR).
• Credentials for data stores are entered locally.
Implementations for other steps can also be selected from those uploaded, to match local
development capacity, or simply left as they are for out-of-the-box execution.
25
Applications
Clinical trials i
Recruitment in the REST clinical trial (AOMd) was handled using the TRANSFoRm e-source
trial platform.
In the original trial, archetype-based criteria were translated to concrete implementations
(e.g. XPath queries) by TRANSFoRm in order to determine a patient’s eligibility from their
EHR.
26
Clinical trials ii
We developed a new service (PhEM) that instead enables the execution of a computable
phenotype against an EHR in order to identify eligible patients at the point-of-care.
EHR
integration
Archetype
execution
Data Node Connector
GP
Patient
record
Study system
Archetype
translation
Phenotype
execution
microservice
(PhEM)
The use of PhEM was shown to increase recruitment accuracy.
Chapman, Martin, et al. “Using Computable Phenotypes in Point-of-Care Clinical Trial
Recruitment”. MIE, 2021.
27
Provenance
A reverse application: connected Phenoflow with the Data Provenance Template server, a
piece of software that holds structured fragments of provenance.
These fragments record the evolution
of the data (definitions) within Phe-
noflow, as they are edited by users, im-
proving reusability (R1.2):
used used wasAssociatedWith
wasGeneratedBy
var:updated
prov:end vvar:time
prov:type phenoflow#Updated
zone:id update
var:author
prov:type phenoflow#Author
var:phenotypeAfter
phenoflow:description vvar:description
phenoflow:name vvar:name
prov:type phenoflow#Phenotype
var:phenotypeBefore
prov:type phenoflow#Phenotype
var:step
phenoflow:coding vvar:coding
phenoflow:doc vvar:doc
phenoflow:position vvar:position
phenoflow:stepName vvar:stepName
phenoflow:type vvar:type
prov:type phenoflow#Step
zone:id update
Fairweather, Elliot, et al. “A delayed instantiation approach to template-driven provenance for
electronic health record phenotyping”. IPAW, 2020.
28
The future
Other future work i
• Always interested in parsing definitions from new sources.
• Publish more implementations for complex disease-specific phenotypes, e.g. long covid
(LOCOMOTION; phenotypes from NW London GP records) and stroke (KCL NIHR;
phenotypes from SLSR).
• Increase the library of workflow modules (e.g. types of dataset connectors) ready for
download and use.
• Automatic data conversion to enable use of different implementation techniques on same
dataset, e.g. conversion from CSV to DB to allow use of SQL scripts.
29
Thank you!
Welcome to visit Phenoflow itself: http://kclhi.org/phenoflow.
Learn more about how to interact with the library:
https://github.com/kclhi/phenoflow/wiki.
View the architecture on Github: https://github.com/kclhi/phenoflow.
Publications mentioned: https://martinchapman.co.uk/publications/pheno.
Contact: @martin chap man.
30
1 of 39

Recommended

Phenoflow 2021 by
Phenoflow 2021Phenoflow 2021
Phenoflow 2021Martin Chapman
118 views53 slides
Phenoflow: A Microservice Architecture for Portable Workflow-based Phenotype ... by
Phenoflow: A Microservice Architecture for Portable Workflow-based Phenotype ...Phenoflow: A Microservice Architecture for Portable Workflow-based Phenotype ...
Phenoflow: A Microservice Architecture for Portable Workflow-based Phenotype ...Martin Chapman
63 views21 slides
Using CWL to support EHR-based phenotyping by
Using CWL to support EHR-based phenotypingUsing CWL to support EHR-based phenotyping
Using CWL to support EHR-based phenotypingMartin Chapman
111 views46 slides
Exploring Models of Computation through Static Analysis by
Exploring Models of Computation through Static AnalysisExploring Models of Computation through Static Analysis
Exploring Models of Computation through Static Analysisijeukens
209 views11 slides
Unit 1 by
Unit  1Unit  1
Unit 1donny101
221 views48 slides
Pretzel: optimized Machine Learning framework for low-latency and high throu... by
Pretzel: optimized Machine Learning framework for  low-latency and high throu...Pretzel: optimized Machine Learning framework for  low-latency and high throu...
Pretzel: optimized Machine Learning framework for low-latency and high throu...NECST Lab @ Politecnico di Milano
108 views25 slides

More Related Content

Similar to Phenoflow: An Architecture for Computable Phenotypes

Principal of objected oriented programming by
Principal of objected oriented programming Principal of objected oriented programming
Principal of objected oriented programming Rokonuzzaman Rony
562 views32 slides
C++ programing lanuage by
C++ programing lanuageC++ programing lanuage
C++ programing lanuageNimai Chand Das
697 views24 slides
Interoduction to c++ by
Interoduction to c++Interoduction to c++
Interoduction to c++Amresh Raj
275 views21 slides
Executive SummaryIntroductionProtein engineering by
Executive SummaryIntroductionProtein engineeringExecutive SummaryIntroductionProtein engineering
Executive SummaryIntroductionProtein engineeringBetseyCalderon89
2 views21 slides
Bt0066 database management system2 by
Bt0066 database management system2Bt0066 database management system2
Bt0066 database management system2Techglyphs
91 views14 slides
Case Study On System Requirement Modeling by
Case Study On System Requirement ModelingCase Study On System Requirement Modeling
Case Study On System Requirement ModelingLaura Scott
3 views39 slides

Similar to Phenoflow: An Architecture for Computable Phenotypes(20)

Principal of objected oriented programming by Rokonuzzaman Rony
Principal of objected oriented programming Principal of objected oriented programming
Principal of objected oriented programming
Rokonuzzaman Rony562 views
Interoduction to c++ by Amresh Raj
Interoduction to c++Interoduction to c++
Interoduction to c++
Amresh Raj275 views
Executive SummaryIntroductionProtein engineering by BetseyCalderon89
Executive SummaryIntroductionProtein engineeringExecutive SummaryIntroductionProtein engineering
Executive SummaryIntroductionProtein engineering
Bt0066 database management system2 by Techglyphs
Bt0066 database management system2Bt0066 database management system2
Bt0066 database management system2
Techglyphs91 views
Case Study On System Requirement Modeling by Laura Scott
Case Study On System Requirement ModelingCase Study On System Requirement Modeling
Case Study On System Requirement Modeling
Laura Scott3 views
Software architacture recovery by Imdad Ul Haq
Software architacture recoverySoftware architacture recovery
Software architacture recovery
Imdad Ul Haq87 views
A Formal Executable Semantics Of Verilog by Tracy Morgan
A Formal Executable Semantics Of VerilogA Formal Executable Semantics Of Verilog
A Formal Executable Semantics Of Verilog
Tracy Morgan2 views
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (... by kevig
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
kevig117 views
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (... by ijnlc
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
ijnlc66 views
Password protected diary by SHARDA SHARAN
Password protected diaryPassword protected diary
Password protected diary
SHARDA SHARAN2.1K views
A Domain-Specific Embedded Language for Programming Parallel Architectures. by Jason Hearne-McGuiness
A Domain-Specific Embedded Language for Programming Parallel Architectures.A Domain-Specific Embedded Language for Programming Parallel Architectures.
A Domain-Specific Embedded Language for Programming Parallel Architectures.
Using Met-modeling Graph Grammars and R-Maude to Process and Simulate LRN Models by Waqas Tariq
Using Met-modeling Graph Grammars and R-Maude to Process and Simulate LRN ModelsUsing Met-modeling Graph Grammars and R-Maude to Process and Simulate LRN Models
Using Met-modeling Graph Grammars and R-Maude to Process and Simulate LRN Models
Waqas Tariq235 views
Determan SummerSim_submit_rev3 by John Determan
Determan SummerSim_submit_rev3Determan SummerSim_submit_rev3
Determan SummerSim_submit_rev3
John Determan54 views
Pointcut rejuvenation by Ravi Theja
Pointcut rejuvenationPointcut rejuvenation
Pointcut rejuvenation
Ravi Theja138 views
Reference Representation in Large Metamodel-based Datasets by Markus Scheidgen
Reference Representation in Large Metamodel-based DatasetsReference Representation in Large Metamodel-based Datasets
Reference Representation in Large Metamodel-based Datasets
Markus Scheidgen1.3K views
Abstracting Strings For Model Checking Of C Programs by Martha Brown
Abstracting Strings For Model Checking Of C ProgramsAbstracting Strings For Model Checking Of C Programs
Abstracting Strings For Model Checking Of C Programs
Martha Brown2 views

More from Martin Chapman

Using AI to autonomously identify diseases within groups of patients by
Using AI to autonomously identify diseases within groups of patientsUsing AI to autonomously identify diseases within groups of patients
Using AI to autonomously identify diseases within groups of patientsMartin Chapman
13 views7 slides
Using AI to understand how preventative interventions can improve the health ... by
Using AI to understand how preventative interventions can improve the health ...Using AI to understand how preventative interventions can improve the health ...
Using AI to understand how preventative interventions can improve the health ...Martin Chapman
23 views45 slides
Principles of Health Informatics: Evaluating medical software by
Principles of Health Informatics: Evaluating medical softwarePrinciples of Health Informatics: Evaluating medical software
Principles of Health Informatics: Evaluating medical softwareMartin Chapman
21 views50 slides
Principles of Health Informatics: Usability of medical software by
Principles of Health Informatics: Usability of medical softwarePrinciples of Health Informatics: Usability of medical software
Principles of Health Informatics: Usability of medical softwareMartin Chapman
25 views60 slides
Principles of Health Informatics: Social networks, telehealth, and mobile health by
Principles of Health Informatics: Social networks, telehealth, and mobile healthPrinciples of Health Informatics: Social networks, telehealth, and mobile health
Principles of Health Informatics: Social networks, telehealth, and mobile healthMartin Chapman
4 views56 slides
Principles of Health Informatics: Communication systems in healthcare by
Principles of Health Informatics: Communication systems in healthcarePrinciples of Health Informatics: Communication systems in healthcare
Principles of Health Informatics: Communication systems in healthcareMartin Chapman
42 views58 slides

More from Martin Chapman(20)

Using AI to autonomously identify diseases within groups of patients by Martin Chapman
Using AI to autonomously identify diseases within groups of patientsUsing AI to autonomously identify diseases within groups of patients
Using AI to autonomously identify diseases within groups of patients
Martin Chapman13 views
Using AI to understand how preventative interventions can improve the health ... by Martin Chapman
Using AI to understand how preventative interventions can improve the health ...Using AI to understand how preventative interventions can improve the health ...
Using AI to understand how preventative interventions can improve the health ...
Martin Chapman23 views
Principles of Health Informatics: Evaluating medical software by Martin Chapman
Principles of Health Informatics: Evaluating medical softwarePrinciples of Health Informatics: Evaluating medical software
Principles of Health Informatics: Evaluating medical software
Martin Chapman21 views
Principles of Health Informatics: Usability of medical software by Martin Chapman
Principles of Health Informatics: Usability of medical softwarePrinciples of Health Informatics: Usability of medical software
Principles of Health Informatics: Usability of medical software
Martin Chapman25 views
Principles of Health Informatics: Social networks, telehealth, and mobile health by Martin Chapman
Principles of Health Informatics: Social networks, telehealth, and mobile healthPrinciples of Health Informatics: Social networks, telehealth, and mobile health
Principles of Health Informatics: Social networks, telehealth, and mobile health
Martin Chapman4 views
Principles of Health Informatics: Communication systems in healthcare by Martin Chapman
Principles of Health Informatics: Communication systems in healthcarePrinciples of Health Informatics: Communication systems in healthcare
Principles of Health Informatics: Communication systems in healthcare
Martin Chapman42 views
Principles of Health Informatics: Terminologies and classification systems by Martin Chapman
Principles of Health Informatics: Terminologies and classification systemsPrinciples of Health Informatics: Terminologies and classification systems
Principles of Health Informatics: Terminologies and classification systems
Martin Chapman49 views
Principles of Health Informatics: Representing medical knowledge by Martin Chapman
Principles of Health Informatics: Representing medical knowledgePrinciples of Health Informatics: Representing medical knowledge
Principles of Health Informatics: Representing medical knowledge
Martin Chapman18 views
Principles of Health Informatics: Informatics skills - searching and making d... by Martin Chapman
Principles of Health Informatics: Informatics skills - searching and making d...Principles of Health Informatics: Informatics skills - searching and making d...
Principles of Health Informatics: Informatics skills - searching and making d...
Martin Chapman46 views
Principles of Health Informatics: Informatics skills - communicating, structu... by Martin Chapman
Principles of Health Informatics: Informatics skills - communicating, structu...Principles of Health Informatics: Informatics skills - communicating, structu...
Principles of Health Informatics: Informatics skills - communicating, structu...
Martin Chapman964 views
Principles of Health Informatics: Models, information, and information systems by Martin Chapman
Principles of Health Informatics: Models, information, and information systemsPrinciples of Health Informatics: Models, information, and information systems
Principles of Health Informatics: Models, information, and information systems
Martin Chapman28 views
Using AI to understand how preventative interventions can improve the health ... by Martin Chapman
Using AI to understand how preventative interventions can improve the health ...Using AI to understand how preventative interventions can improve the health ...
Using AI to understand how preventative interventions can improve the health ...
Martin Chapman29 views
Using Microservices to Design Patient-facing Research Software by Martin Chapman
Using Microservices to Design Patient-facing Research SoftwareUsing Microservices to Design Patient-facing Research Software
Using Microservices to Design Patient-facing Research Software
Martin Chapman9 views
COVID-19 Analytics in Jupyter: Intuitive Provenance Integration using ProvIt by Martin Chapman
COVID-19 Analytics in Jupyter: Intuitive Provenance Integration using ProvItCOVID-19 Analytics in Jupyter: Intuitive Provenance Integration using ProvIt
COVID-19 Analytics in Jupyter: Intuitive Provenance Integration using ProvIt
Martin Chapman138 views
Using computable phenotypes in point of care clinical trial recruitment by Martin Chapman
Using computable phenotypes in point of care clinical trial recruitmentUsing computable phenotypes in point of care clinical trial recruitment
Using computable phenotypes in point of care clinical trial recruitment
Martin Chapman39 views
BlocVote: An E-voting system providing an anonymous, secure, transparent, and... by Martin Chapman
BlocVote: An E-voting system providing an anonymous, secure, transparent, and...BlocVote: An E-voting system providing an anonymous, secure, transparent, and...
BlocVote: An E-voting system providing an anonymous, secure, transparent, and...
Martin Chapman85 views
MICRE: Microservices In MediCal Research Environments by Martin Chapman
MICRE: Microservices In MediCal Research EnvironmentsMICRE: Microservices In MediCal Research Environments
MICRE: Microservices In MediCal Research Environments
Martin Chapman220 views
A Microservice Architecture for the Design of Computer-Interpretable Guidelin... by Martin Chapman
A Microservice Architecture for the Design of Computer-Interpretable Guidelin...A Microservice Architecture for the Design of Computer-Interpretable Guidelin...
A Microservice Architecture for the Design of Computer-Interpretable Guidelin...
Martin Chapman309 views
Martin Chapman: Research Overview, 2019 by Martin Chapman
Martin Chapman: Research Overview, 2019Martin Chapman: Research Overview, 2019
Martin Chapman: Research Overview, 2019
Martin Chapman374 views
Smart Home Inventory Management using a Private Blockchain and a Purchase Ord... by Martin Chapman
Smart Home Inventory Management using a Private Blockchain and a Purchase Ord...Smart Home Inventory Management using a Private Blockchain and a Purchase Ord...
Smart Home Inventory Management using a Private Blockchain and a Purchase Ord...
Martin Chapman385 views

Recently uploaded

Ecology by
Ecology Ecology
Ecology Abhijith Raj.R
6 views10 slides
Artificial Intelligence Helps in Drug Designing and Discovery.pptx by
Artificial Intelligence Helps in Drug Designing and Discovery.pptxArtificial Intelligence Helps in Drug Designing and Discovery.pptx
Artificial Intelligence Helps in Drug Designing and Discovery.pptxabhinashsahoo2001
117 views22 slides
Connecting communities to promote FAIR resources: perspectives from an RDA / ... by
Connecting communities to promote FAIR resources: perspectives from an RDA / ...Connecting communities to promote FAIR resources: perspectives from an RDA / ...
Connecting communities to promote FAIR resources: perspectives from an RDA / ...Allyson Lister
33 views49 slides
Workshop LLM Life Sciences ChemAI 231116.pptx by
Workshop LLM Life Sciences ChemAI 231116.pptxWorkshop LLM Life Sciences ChemAI 231116.pptx
Workshop LLM Life Sciences ChemAI 231116.pptxMarco Tibaldi
101 views37 slides
scopus cited journals.pdf by
scopus cited journals.pdfscopus cited journals.pdf
scopus cited journals.pdfKSAravindSrivastava
5 views15 slides
Guinea Pig as a Model for Translation Research by
Guinea Pig as a Model for Translation ResearchGuinea Pig as a Model for Translation Research
Guinea Pig as a Model for Translation ResearchPervaizDar1
11 views21 slides

Recently uploaded(20)

Artificial Intelligence Helps in Drug Designing and Discovery.pptx by abhinashsahoo2001
Artificial Intelligence Helps in Drug Designing and Discovery.pptxArtificial Intelligence Helps in Drug Designing and Discovery.pptx
Artificial Intelligence Helps in Drug Designing and Discovery.pptx
abhinashsahoo2001117 views
Connecting communities to promote FAIR resources: perspectives from an RDA / ... by Allyson Lister
Connecting communities to promote FAIR resources: perspectives from an RDA / ...Connecting communities to promote FAIR resources: perspectives from an RDA / ...
Connecting communities to promote FAIR resources: perspectives from an RDA / ...
Allyson Lister33 views
Workshop LLM Life Sciences ChemAI 231116.pptx by Marco Tibaldi
Workshop LLM Life Sciences ChemAI 231116.pptxWorkshop LLM Life Sciences ChemAI 231116.pptx
Workshop LLM Life Sciences ChemAI 231116.pptx
Marco Tibaldi101 views
Guinea Pig as a Model for Translation Research by PervaizDar1
Guinea Pig as a Model for Translation ResearchGuinea Pig as a Model for Translation Research
Guinea Pig as a Model for Translation Research
PervaizDar111 views
ENTOMOLOGY PPT ON BOMBYCIDAE AND SATURNIIDAE.pptx by MN
ENTOMOLOGY PPT ON BOMBYCIDAE AND SATURNIIDAE.pptxENTOMOLOGY PPT ON BOMBYCIDAE AND SATURNIIDAE.pptx
ENTOMOLOGY PPT ON BOMBYCIDAE AND SATURNIIDAE.pptx
MN6 views
PRINCIPLES-OF ASSESSMENT by rbalmagro
PRINCIPLES-OF ASSESSMENTPRINCIPLES-OF ASSESSMENT
PRINCIPLES-OF ASSESSMENT
rbalmagro11 views
Pollination By Nagapradheesh.M.pptx by MNAGAPRADHEESH
Pollination By Nagapradheesh.M.pptxPollination By Nagapradheesh.M.pptx
Pollination By Nagapradheesh.M.pptx
MNAGAPRADHEESH15 views
Workshop Chemical Robotics ChemAI 231116.pptx by Marco Tibaldi
Workshop Chemical Robotics ChemAI 231116.pptxWorkshop Chemical Robotics ChemAI 231116.pptx
Workshop Chemical Robotics ChemAI 231116.pptx
Marco Tibaldi95 views
CSF -SHEEBA.D presentation.pptx by SheebaD7
CSF -SHEEBA.D presentation.pptxCSF -SHEEBA.D presentation.pptx
CSF -SHEEBA.D presentation.pptx
SheebaD710 views
MSC III_Advance Forensic Serology_Final.pptx by Suchita Rawat
MSC III_Advance Forensic Serology_Final.pptxMSC III_Advance Forensic Serology_Final.pptx
MSC III_Advance Forensic Serology_Final.pptx
Suchita Rawat10 views
Physical Characterization of Moon Impactor WE0913A by Sérgio Sacani
Physical Characterization of Moon Impactor WE0913APhysical Characterization of Moon Impactor WE0913A
Physical Characterization of Moon Impactor WE0913A
Sérgio Sacani42 views
Distinct distributions of elliptical and disk galaxies across the Local Super... by Sérgio Sacani
Distinct distributions of elliptical and disk galaxies across the Local Super...Distinct distributions of elliptical and disk galaxies across the Local Super...
Distinct distributions of elliptical and disk galaxies across the Local Super...
Sérgio Sacani30 views
MODULE-9-Biotechnology, Genetically Modified Organisms, and Gene Therapy.pdf by KerryNuez1
MODULE-9-Biotechnology, Genetically Modified Organisms, and Gene Therapy.pdfMODULE-9-Biotechnology, Genetically Modified Organisms, and Gene Therapy.pdf
MODULE-9-Biotechnology, Genetically Modified Organisms, and Gene Therapy.pdf
KerryNuez121 views

Phenoflow: An Architecture for Computable Phenotypes

  • 1. Phenoflow: An Architecture for Computable Phenotypes Applying FAIR principles to computable phenotype libraries - ELIXIR All Hands Martin Chapman Tuesday 7 June, 2022 King’s College London
  • 2. Overview The story so far... Phenotype models Phenotype tooling Applications The future 1
  • 3. The story so far...
  • 4. HDR UK Phenotype Library Having a standard structure at the definition level goes a long way towards having FAIR phenotypes (e.g. reusable and interoperable). 2
  • 5. What about implementation? SELECT UserID, COUNT(DISTINCT AbnormalLab) AS abnormal lab FROM Patients GROUP BY UserID HAVING abnormal lab > 0; ... We should not, however, ignore the computable forms of these definitions—which either exist already or can be provided—as they have a significant impact on portability (how easy it is to implement a definition). However, storing these implementations alongside our definitions risks compromising their FAIRness: 3
  • 6. (un)FAIR computable phenotypes i It can be quite difficult to determine a suitable connection between a definition and its computable form. As a result, the two are often conflated, reducing intelligibility and thus reusability. 4
  • 7. (un)FAIR computable phenotypes ii Can take many different forms, making them hard to represent in a consistent (and thus intelligible) way. 5
  • 8. (un)FAIR computable phenotypes iii Often have a complex, idiomatic structure, also reducing intelligibility. 6
  • 9. (un)FAIR computable phenotypes iv https://data.ohdsi.org/PhenotypeLibrary Tied to a single standard that is not universally adopted, e.g. OHDSI’s gold standard phenotype library and the OMOP CDM. This reduces interoperability. 7
  • 11. Phenotype models i Phenotype models govern the information required for, and the structure of, phenotype definitions. • They may, for example, govern the logical connectives available to a definition author when producing a definition (e.g. conjunction and disjunction). Would argue that the HDR UK phenotype library achieves FAIR principles by having an inherent model, which all stored definitions must adhere to. A more explicit model can help us to also store computable phenotypes in a FAIR way. 8
  • 12. Phenotype model requirements We need an explicit model that: 1. Connects, yet keeps distinct, a phenotype definition and its computable form. 1.1 A definition needs to remain suitably abstract while making provision for an associated computable counterpart. 9
  • 13. Phenotype model requirements We need an explicit model that: 1. Connects, yet keeps distinct, a phenotype definition and its computable form. 1.1 A definition needs to remain suitably abstract while making provision for an associated computable counterpart. 2. Accommodates (and potentially combines) different computable forms (one-to-many connectivity). 9
  • 14. Phenotype model requirements We need an explicit model that: 1. Connects, yet keeps distinct, a phenotype definition and its computable form. 1.1 A definition needs to remain suitably abstract while making provision for an associated computable counterpart. 2. Accommodates (and potentially combines) different computable forms (one-to-many connectivity). 3. Retains clarity, even in the presence of a complex computable form. 9
  • 15. Phenotype model requirements We need an explicit model that: 1. Connects, yet keeps distinct, a phenotype definition and its computable form. 1.1 A definition needs to remain suitably abstract while making provision for an associated computable counterpart. 2. Accommodates (and potentially combines) different computable forms (one-to-many connectivity). 3. Retains clarity, even in the presence of a complex computable form. 4. Supports a variety of target data formats to enable sharing. 9
  • 16. Phenoflow workflow-based model i The Phenoflow workflow-based model structures definitions as a set of sequential steps, which effectively transition a population of patients to a cohort that exhibit the condition captured. We often describe the model as a non-formal subset of the Common Workflow Language (CWL), where properties such as the ontology used and the step types available are fixed. 10
  • 17. Phenoflow workflow-based model ii Each step in the model consists of three layers: • Abstract: Expresses the logic of that step. Says nothing about implementation. • Functional: Specifies the inputs to, and outputs from, this step (metadata) e.g., the format of an intermediate cohort. • Computational: Defines an environment for the execution of one or more implementation units (e.g. a script, data pipeline module, etc.). 11
  • 18. Phenoflow workflow-based model iii number group id description type step Input Output id description id description extensionA pathA languageA paramsA implementationUnitA Computational Implementation Units pathB languageB paramsB implementationUnitB Abstract Functional Figure 1: Structured phenotype definition model (step) and implementation units. 12
  • 19. (1) Separate, yet connect, a phenotype definition with its computable form The separation of the model into logic and implementation layers provides the required connectivity with a computable form, without affecting abstraction: 2 - icd10 A case is identified in the presence of pa- tients associated with the stated icd10 COVID-19 codes. logic step Input Output covid19 cohort Potential covid19 cases. covid19 cases icd10 covid19 cases, as identified by icd10 coding. csv icd10.py python - for row in c s v r e a d e r : newRow = row . copy () for c e l l in row : i f [ value for value in row [ c e l l ] . s p l i t ( ” , ” ) i f value in codes ] : newRow [ ” covid19 ” ] = ”CASE” ... Computational Implementation Units icd10.js javascript - for ( row of csvData ){ newRow = row . s l i c e ( ) ; for ( c e l l of row ){ i f ( c e l l . s p l i t ( ” , ” ) . f i l t e r ( code=>codes . indexOf ( code )>−1). length ){ newRow . push ( ”CASE” ) ; ... Abstract Functional Figure 2: Individual step of COVID-19 code-based Phenoflow definition and implementation units. 13
  • 20. (2) Accommodate (and potentially combine) all definitions types i The generality of the model allows it to capture information relating to a wide range of different definition types. Similarly, the separation of logic into steps, with clear inputs and outputs, makes each step self-contained, allowing types to be mixed within a single definition. • One step may identify patients based on a list of codes, while a subsequent step may describe the use of more complex NLP techniques in order to identify patients. 14
  • 21. (2) Accommodate (and potentially combine) all definitions types ii 1 case assignment rx t2dm med- abnormal lab A case is identified in the presence of an abnormal lab value (defined as one of three different exacerbations in blood sugar level) AND if medication for this type of diabetes has been prescribed. boolean step Input Output dm cohort- abnormal lab Potential t2dm cases, with ab- normal lab results identified dm cases- case assignment 1 t2dm cases, as iden- tified by first case as- signment rule csv rx t2dm med-abnormal lab.knwf knime -File= Computational Implementation Units rx t2dm med-abnormal lab.py python - ... i f row [ ” t1dm dx cnt ” ] == ”0” and row [ ” t2dm dx cnt ” ] == ”0” and ” t2dm rx dts ” in c s v r e a d e r . fieldnames and ” abnormal lab ” in c s v r e a d e r . fieldnames and row [ ” abnormal lab ” ] == ”1” : row [ ”t2dm” ] = ”CASE” ; ... Abstract Functional Figure 3: Individual step of T2DM logic-based Phenoflow definition and implementation units. 15
  • 22. (3) Prioritise clarity to combat the complexity of implementations On top of definitions now having an expected structure, separation into steps provides a logical flow. Each step provides three descriptions of the functionality it contains, to aid clarity: 1. A single ID, providing an overview of the step’s functionality. 2. A longer description of the functionality contained within the step. 3. A classification of the step under a pre-defined ontology, so that even if the ID and description are not sufficient, a general understanding of the functionality of the step can still be extracted1 . Inputs and outputs to each step provide further information. 1As of now we still use simple classification, e.g. logic, but finer a granularity of types is forthcoming. 16
  • 23. (4) Support a variety of target data formats We add additional contraints to the workflow-based structure to dictate that the first step in a definition is always a data read (and the last is always a cohort output). Because of the modularity of the model structure, we are able to swap in and out the logic, and associated implementation, of the data read step – while the other steps remain unchanged – in order to accommodate multiple data formats. More on this connector approach shortly. 17
  • 25. Phenoflow web architecture The Phenoflow model is complemented by a web architecture that allows us to realise these benefits. Chapman, Martin, et al. “Phenoflow: A microservice architecture for portable workflow-based phenotype definitions.” AMIA, 2021. 18
  • 26. Parse i “A model is only useful if it’s used”. The primary way in which we connect definitions with computable forms is by parsing them from existing sources. 19
  • 27. Parse ii Grab data relating to a definition (codelist, drug list, keywords, more complex logic). Determine where key information is within this data (e.g. a ‘conceptid’ column in a codelist CSV). Populate the information required for the Phenoflow model au- tomatically (e.g. step structure – easier for something like a codelist; may require some intervention with more complex logic). Automatically generate implementation units for each step. Add these to the Phenoflow library ready for download. 20
  • 28. Parsing example—HDR UK national phenomics resource i Have imported, standardised and provided implementations for ∼300 existing definitions as a part of the HDR phenomics resource: Phenoflow Concept Library (Web Interface + API access) CALIBER Gateway User Bulk import API access Workflow link Workflow link Dataset link Phenotype link Dataset link Phenotype Link (API access) Library Platform 21
  • 29. Parsing example—HDR UK national phenomics resource ii These are now directly linked to from the HDR UK Phenotype Portal: 22
  • 30. Author Phenotypes can also be authored directly, by submitting specifications to Phenoflow’s API. • Here, multiple implementations can be provided across the steps, and for each step, either differing in language (e.g. python vs. javascript) or in form (e.g. a code list vs. a trained classifier). 23
  • 31. View Parsed or authored definitions can be viewed under the model (including the multi-dimensional descriptions) within the library. 24
  • 32. Download Pick a connector to be the first step in the workflow, depending on the format of the dataset you are targeting (e.g., OMOP, i2b2, FHIR). • Credentials for data stores are entered locally. Implementations for other steps can also be selected from those uploaded, to match local development capacity, or simply left as they are for out-of-the-box execution. 25
  • 34. Clinical trials i Recruitment in the REST clinical trial (AOMd) was handled using the TRANSFoRm e-source trial platform. In the original trial, archetype-based criteria were translated to concrete implementations (e.g. XPath queries) by TRANSFoRm in order to determine a patient’s eligibility from their EHR. 26
  • 35. Clinical trials ii We developed a new service (PhEM) that instead enables the execution of a computable phenotype against an EHR in order to identify eligible patients at the point-of-care. EHR integration Archetype execution Data Node Connector GP Patient record Study system Archetype translation Phenotype execution microservice (PhEM) The use of PhEM was shown to increase recruitment accuracy. Chapman, Martin, et al. “Using Computable Phenotypes in Point-of-Care Clinical Trial Recruitment”. MIE, 2021. 27
  • 36. Provenance A reverse application: connected Phenoflow with the Data Provenance Template server, a piece of software that holds structured fragments of provenance. These fragments record the evolution of the data (definitions) within Phe- noflow, as they are edited by users, im- proving reusability (R1.2): used used wasAssociatedWith wasGeneratedBy var:updated prov:end vvar:time prov:type phenoflow#Updated zone:id update var:author prov:type phenoflow#Author var:phenotypeAfter phenoflow:description vvar:description phenoflow:name vvar:name prov:type phenoflow#Phenotype var:phenotypeBefore prov:type phenoflow#Phenotype var:step phenoflow:coding vvar:coding phenoflow:doc vvar:doc phenoflow:position vvar:position phenoflow:stepName vvar:stepName phenoflow:type vvar:type prov:type phenoflow#Step zone:id update Fairweather, Elliot, et al. “A delayed instantiation approach to template-driven provenance for electronic health record phenotyping”. IPAW, 2020. 28
  • 38. Other future work i • Always interested in parsing definitions from new sources. • Publish more implementations for complex disease-specific phenotypes, e.g. long covid (LOCOMOTION; phenotypes from NW London GP records) and stroke (KCL NIHR; phenotypes from SLSR). • Increase the library of workflow modules (e.g. types of dataset connectors) ready for download and use. • Automatic data conversion to enable use of different implementation techniques on same dataset, e.g. conversion from CSV to DB to allow use of SQL scripts. 29
  • 39. Thank you! Welcome to visit Phenoflow itself: http://kclhi.org/phenoflow. Learn more about how to interact with the library: https://github.com/kclhi/phenoflow/wiki. View the architecture on Github: https://github.com/kclhi/phenoflow. Publications mentioned: https://martinchapman.co.uk/publications/pheno. Contact: @martin chap man. 30