Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Phenoflow: A Microservice Architecture for Portable Workflow-based Phenotype Definitions

AMIA Informatics Summit, 2021

  • Be the first to comment

  • Be the first to like this

Phenoflow: A Microservice Architecture for Portable Workflow-based Phenotype Definitions

  1. 1. Martin Chapman King’s College London #IS21 Phenoflow: A Microservice Architecture for Portable Workflow-based Phenotype Definitions Phenotyping: Implementation and Application S25
  2. 2. Learning Objectives After participating in this session the learner should be better able to: • Understand the current issues with converting phenotype definitions into executable code, and how a novel structured phenotype definition can improve clarity and reduce implementation burden. 2 2021 Informatics Summit | amia.org
  3. 3. Disclosure I and my spouse/partner have no relevant relationships with commercial interests to disclose. 3 2021 Informatics Summit | amia.org
  4. 4. Phenotype definition vs. computable form Phenotype definitions are designed to ensure portability across multiple use cases by providing an abstract outline of functionality (e.g. a data flow diagram, a code list, etc.), which is then realised as a computable phenotype for a given dataset (e.g. SQL script, Python code, etc.). 4 2021 Informatics Summit | amia.org Definition Computable Form
  5. 5. Definition challenges 1. Complex phenotype definitions, both in terms of structure and terminology, are needed for accuracy but reduce portability. 2. An abstract definition says little about how to realise the phenotype in practice (i.e. from a technical perspective), also reducing portability. 5 2021 Informatics Summit | amia.org
  6. 6. Workflow-based model We introduce a new workflow-based model for the definition of a phenotype, designed to address these issues. The layers of the model are: 1. Abstract - Expresses the logic of a phenotype through a set of simple sequential, potentially nested steps, each of which is annotated with multiple descriptions, in order to tackle complexity. 2. Functional - Specifies the metadata of entities passed between the operations within the abstract layer, e.g., the format of an intermediate cohort. 3. Computational - Defines an environment for the execution of one or more implementation units (e.g. a script, data pipeline module, etc.) for each step in the abstract layer, providing a template for development. 6 2021 Informatics Summit | amia.org
  7. 7. Workflow-based model 7 2021 Informatics Summit | amia.org
  8. 8. Workflow-based model 8 2021 Informatics Summit | amia.org
  9. 9. Phenoflow A researcher is not expected to develop definitions under this model directly. Instead, definitions are authored using an online library, Phenoflow, which is able to generate a computable form from a definition as a Common Workflow Language (CWL) workflow. Phenoflow comprises several microservices to enable the generation process. 9 2021 Informatics Summit | amia.org
  10. 10. Phenoflow Authoring a new definition under our model: Phenotypes can also be authored via an API (with accompanying Python client), or by bulk importing existing definitions. 10 2021 Informatics Summit | amia.org
  11. 11. Phenoflow Proceed with implementation by matching each step in the model to an implementation unit: 11 2021 Informatics Summit | amia.org
  12. 12. Phenoflow The CWL workflow can then be generated—based on the definition and supplied implementation units—downloaded and executed against a local dataset in order to identify a given cohort: 12 2021 Informatics Summit | amia.org
  13. 13. Evaluation and results Determine the suitability of the model as a representation format, and the suitability of the CWL implementations: 1. Selected T2DM phenotype definition (logic-based), and example computable form (phekb.org/phenotype/type-2-diabetes-mellitus). 2. Selected research cohort from Northwestern University (26,406 patients). 3. Re-authored the definition according to our model, using Phenoflow. 4. Generated a CWL implementation of the definition, using Phenoflow. 5. Executed both computable forms against the dataset, confirming same results using a gold standard. 13 2021 Informatics Summit | amia.org
  14. 14. Evaluation and results Determine the suitability of the model as a representation format, and the suitability of the generated implementations: 6. Repeated for COVID-19 phenotype (code-based), taken from covid19- phenomics.org, and a set of 1468 individuals who tested positive for COVID-19 at Guy's and St. Thomas' NHS Foundation Trust (GSTT). 14 2021 Informatics Summit | amia.org
  15. 15. Evaluation and results Showed portability improvements in terms of clinical knowledge requirements and programming expertise using the Knowledge conversion, clause Interpretation, and Programming (KIP) phenotype portability scoring system (Shang et al., JBI, 2019.). 15 2021 Informatics Summit | amia.org Knowledge Clause Programming Total* Traditional code 0 2 2 4 Structured code 0 0 0 0 Traditional logic 1 1 2 4 Structured logic 0 1 0 1 Table 1: KIP scores indicating the portability of traditional code-based (COVID-19) and logic-based (Type 2 Diabetes) phenotype definitions and their structured counterparts. *High scores = less portable
  16. 16. Definition challenges 1. Complex phenotype definitions, both in terms of structure and terminology, are needed for accuracy but reduce portability. 1. The Phenoflow model provides a specific structure and intelligible multi-dimensional descriptions to enable both accurate and portable definitions. 2. An abstract definition says little about how to realise the phenotype in practice (i.e. from a technical perspective), also reducing portability. 1. The Phenoflow model includes information to guide implementation, improving portability. Additional impact on portability provided by Phenoflow library, beyond just the model: 16 2021 Informatics Summit | amia.org
  17. 17. Library impact on portability Adding an alternate implementation for an abstract step: 17 2021 Informatics Summit | amia.org
  18. 18. Library impact on portability Selecting which type of implementation units to include in the computable form, depending on local development requirements: 18 2021 Informatics Summit | amia.org
  19. 19. Future work 1. Leveraging the multi-layer model to introduce advanced library search criteria, and novel ways to search (e.g. uploading existing definitions). 2. Further leveraging the multi-layer model to express relationships between phenotypes (e.g. sub-phenotypes) at each layer of the model. 3. Increase the library of workflow modules (e.g. types of dataset connectors) ready for download and use. 1. We already provide connectors for i2b2 and OMOP (as well as local CSV files). 4. Automatic data conversion to enable use of different implementation techniques on same dataset, e.g. conversion from CSV to DB to allow use of SQL scripts. 19 2021 Informatics Summit | amia.org
  20. 20. Links https://kclhi.org/phenoflow https://github.com/kclhi/phenoflow 20 2021 Informatics Summit | amia.org
  21. 21. Thank you!

×