gsk.com
How will knowledgegraphs improve clinical reporting workflows?
Presenters: Alexey Kuznetsov (alexey.k.kuznetsov@gsk.com) & Shannon Haughton(shannon.l.haughton@gsk.com)
16Nov2022
21 March 2023 2
Our Problem Statement
Tremendous Resource, Multiple Handoffs, Numerous Transformations
Single
Study
SDTM* ADaM TFLs
Submission
5 – 10
studies
99 modules
*From 77 legacy datasets
Clinical
data flow
Trial
design
Collect
data
Review
observed
datasets
Analyse
datasets
Review
results
EDC
Lab data
Randomisation
Others...
Protocol
Metadata
Examples 71 datasets 42 datasets <= 250 outputs
SDTM ADaM TFLs
350 - 710
to integrate
210 – 420
to integrate
<= 250 integrated
outputs
Re-transformations
(Several Standard)
Re-Mappings
(Several Standards)
21 March 2023 3
Imagine a world where anything is possible…
True automation of
standard analyses
Ad-hoc requests
delivered on demand
(Blinded) Analysis
results reviewedin
real-time
Manual effort of
results validation
virtually eliminated
Data visualisations
available in-stream
GDPR and patient
consent
Risk-based
monitoring is
proactive
Google-like Q&A
system for our
trial data
Clinical
application of
preclinical AI
algorithms
21 March 2023 4
From Imagination to Reality
Clinical Knowledge Graph
Let’s move away from isolated data domain silos…
…to ONE contextualised Clinical Knowledge Graph
Exposure
Domain
Subject = Bob
Study Day = 1 Dosage = 40mg
Trial = Trial1
MedicalHistory
Domain Subject = Bob
Event Date = 2000
Term =
Hypertension
Trial = Trial1
Adverse Events
Domain
Subject = Bob
Study Day = 1 Term = Headache
Trial = Trial 1
Demographics
Domain
Sex = M Age = 75
Subject = Bob
Trial = Trial1
Clinical Knowledge Graph
21 March 2023 5
Our Idea
…the Google Translate for our clinical data – helping us translate our complex data landscape to
answer important scientific questions
Clinical
data flow
Trial
design
Collect
data
Review
observed
datasets
Analyse
datasets
Review
results
EDC
Lab data
Randomisation
Others...
Protocol
Metadata
Examples
99 Modules
GSK Design
(1 Standard)
One Connected Data Model
Parallel Processing
SDTM (71)
ADaM (42)
TFLs (<=250)
ISS/ISE
Select Required Standard
ETL
Modules
Parallel Processing
21 March 2023 6
Unique Value
KnowledgeGraph
Greater control
over data privacy
Modern graph
analytics &
visualisation
Decoupling
vertical data
pipeline
Accelerated
decision making
21 March 2023 7
Goal: Test feasibility, desirability & sustainability of idea
Phased agile & risk-based approach with predefined success criteria
EXPERIMENT 1
Can we ingest SDTM
data
into CLD MVP?
EXPERIMENT 3
Can we analyse, report
and egress
TLFs from CLD MVP?
EXPERIMENT 2
Can we enrich CLD MVP
model with ADaM
Transformations?
2021 H1 2021 H2 2022 H1 2022 H2
MVP PILOT
Can we use the CLD
MVP to perform QC for
an ongoing trial?
Continuouslearning and iteration
21 March 2023 8
How do we load tables into graph
21 March 2023 9
How do we load tables into graph
21 March 2023 10
How do we load tables into graph
21 March 2023 11
How do we load tables into graph
21 March 2023 12
How do we load tables into graph
21 March 2023 13
How do we load tables into graph
21 March 2023 14
How do we store machine readable derivation metadata as graph
21 March 2023 15
How do we store machine readable derivation metadata as graph
aage = scrndt - brthdtc +1
21 March 2023 16
How do we store machine readable derivation metadata as graph
aage = scrndt - brthdtc +1
21 March 2023 17
How do we store machine readable derivation metadata as graph
aage = scrndt - brthdtc +1
21 March 2023 18
How do we store machine readable derivation metadata as graph
aage = scrndt - brthdtc +1
21 March 2023 19
How do we store machine readable derivation metadata as graph
aage = scrndt - brthdtc +1
modular
dependant
orchestration
21 March 2023 20
How do we store machine readable derivation metadata as graph
21 March 2023 21
How do we store machine readable summary statistics as graph
21 March 2023 22
How do we store machine readable summary statistics as graph
21 March 2023 23
How do we store machine readable summary statistics as graph
specification of statistics
21 March 2023 24
How do we store machine readable summary statistics as graph
specification of statistics
the what
21 March 2023 25
How do we store machine readable summary statistics as graph
specification of statistics
the what
the how
21 March 2023 26
How do we store machine readable summary statistics as graph
specification of statistics
the what
the how
qualifiers
21 March 2023 27
How do we store machine readable summary statistics as graph
21 March 2023 28
How do we store machine readable summary statistics as graph
SEX Mean Value
F 32.4
M 34.0
21 March 2023 29
Open source assets released
To be released:
• tab2neo
• neo4cdisc
GSK-Biostatistics/neointerface:
NeoInterface -Neo4j made easy for
Python programmers!(github.com)
read/write
csv, xls, xlsx, xpt,
sas7bdat, rda
dm/ae/lb/...
dm/ae/../custom
21 March 2023 30
Learnings that helped us accelerate our idea
Pre-defined
success criteria
critical in quick
decision making
Prioritise 1 idea, test it,
refine it, test it, refine it…
Focused innovation
challenge can greatly help
test disruptive ideas
Understand painpoints &
test ideas to drive
informed innovation
Timeboxed focused
sprints are great to
inform the path ahead
21 March 2023 31
Special thanks to…
Jorine Putter
Michael Rimler
Samantha Warden
Kirsten Langendorf
Johannes Ulander
Dave Iberson-Hurst
Eleanor Sparling
Rachel Ren
James Sefton
William McDermott
Jonathan Deacon
Benjamin Grinsted
Julian West
It takes a village
to raise an idea…
gsk.com

GSK: How Knowledge Graphs Improve Clinical Reporting Workflows

  • 1.
    gsk.com How will knowledgegraphsimprove clinical reporting workflows? Presenters: Alexey Kuznetsov (alexey.k.kuznetsov@gsk.com) & Shannon Haughton(shannon.l.haughton@gsk.com) 16Nov2022
  • 2.
    21 March 20232 Our Problem Statement Tremendous Resource, Multiple Handoffs, Numerous Transformations Single Study SDTM* ADaM TFLs Submission 5 – 10 studies 99 modules *From 77 legacy datasets Clinical data flow Trial design Collect data Review observed datasets Analyse datasets Review results EDC Lab data Randomisation Others... Protocol Metadata Examples 71 datasets 42 datasets <= 250 outputs SDTM ADaM TFLs 350 - 710 to integrate 210 – 420 to integrate <= 250 integrated outputs Re-transformations (Several Standard) Re-Mappings (Several Standards)
  • 3.
    21 March 20233 Imagine a world where anything is possible… True automation of standard analyses Ad-hoc requests delivered on demand (Blinded) Analysis results reviewedin real-time Manual effort of results validation virtually eliminated Data visualisations available in-stream GDPR and patient consent Risk-based monitoring is proactive Google-like Q&A system for our trial data Clinical application of preclinical AI algorithms
  • 4.
    21 March 20234 From Imagination to Reality Clinical Knowledge Graph Let’s move away from isolated data domain silos… …to ONE contextualised Clinical Knowledge Graph Exposure Domain Subject = Bob Study Day = 1 Dosage = 40mg Trial = Trial1 MedicalHistory Domain Subject = Bob Event Date = 2000 Term = Hypertension Trial = Trial1 Adverse Events Domain Subject = Bob Study Day = 1 Term = Headache Trial = Trial 1 Demographics Domain Sex = M Age = 75 Subject = Bob Trial = Trial1 Clinical Knowledge Graph
  • 5.
    21 March 20235 Our Idea …the Google Translate for our clinical data – helping us translate our complex data landscape to answer important scientific questions Clinical data flow Trial design Collect data Review observed datasets Analyse datasets Review results EDC Lab data Randomisation Others... Protocol Metadata Examples 99 Modules GSK Design (1 Standard) One Connected Data Model Parallel Processing SDTM (71) ADaM (42) TFLs (<=250) ISS/ISE Select Required Standard ETL Modules Parallel Processing
  • 6.
    21 March 20236 Unique Value KnowledgeGraph Greater control over data privacy Modern graph analytics & visualisation Decoupling vertical data pipeline Accelerated decision making
  • 7.
    21 March 20237 Goal: Test feasibility, desirability & sustainability of idea Phased agile & risk-based approach with predefined success criteria EXPERIMENT 1 Can we ingest SDTM data into CLD MVP? EXPERIMENT 3 Can we analyse, report and egress TLFs from CLD MVP? EXPERIMENT 2 Can we enrich CLD MVP model with ADaM Transformations? 2021 H1 2021 H2 2022 H1 2022 H2 MVP PILOT Can we use the CLD MVP to perform QC for an ongoing trial? Continuouslearning and iteration
  • 8.
    21 March 20238 How do we load tables into graph
  • 9.
    21 March 20239 How do we load tables into graph
  • 10.
    21 March 202310 How do we load tables into graph
  • 11.
    21 March 202311 How do we load tables into graph
  • 12.
    21 March 202312 How do we load tables into graph
  • 13.
    21 March 202313 How do we load tables into graph
  • 14.
    21 March 202314 How do we store machine readable derivation metadata as graph
  • 15.
    21 March 202315 How do we store machine readable derivation metadata as graph aage = scrndt - brthdtc +1
  • 16.
    21 March 202316 How do we store machine readable derivation metadata as graph aage = scrndt - brthdtc +1
  • 17.
    21 March 202317 How do we store machine readable derivation metadata as graph aage = scrndt - brthdtc +1
  • 18.
    21 March 202318 How do we store machine readable derivation metadata as graph aage = scrndt - brthdtc +1
  • 19.
    21 March 202319 How do we store machine readable derivation metadata as graph aage = scrndt - brthdtc +1 modular dependant orchestration
  • 20.
    21 March 202320 How do we store machine readable derivation metadata as graph
  • 21.
    21 March 202321 How do we store machine readable summary statistics as graph
  • 22.
    21 March 202322 How do we store machine readable summary statistics as graph
  • 23.
    21 March 202323 How do we store machine readable summary statistics as graph specification of statistics
  • 24.
    21 March 202324 How do we store machine readable summary statistics as graph specification of statistics the what
  • 25.
    21 March 202325 How do we store machine readable summary statistics as graph specification of statistics the what the how
  • 26.
    21 March 202326 How do we store machine readable summary statistics as graph specification of statistics the what the how qualifiers
  • 27.
    21 March 202327 How do we store machine readable summary statistics as graph
  • 28.
    21 March 202328 How do we store machine readable summary statistics as graph SEX Mean Value F 32.4 M 34.0
  • 29.
    21 March 202329 Open source assets released To be released: • tab2neo • neo4cdisc GSK-Biostatistics/neointerface: NeoInterface -Neo4j made easy for Python programmers!(github.com) read/write csv, xls, xlsx, xpt, sas7bdat, rda dm/ae/lb/... dm/ae/../custom
  • 30.
    21 March 202330 Learnings that helped us accelerate our idea Pre-defined success criteria critical in quick decision making Prioritise 1 idea, test it, refine it, test it, refine it… Focused innovation challenge can greatly help test disruptive ideas Understand painpoints & test ideas to drive informed innovation Timeboxed focused sprints are great to inform the path ahead
  • 31.
    21 March 202331 Special thanks to… Jorine Putter Michael Rimler Samantha Warden Kirsten Langendorf Johannes Ulander Dave Iberson-Hurst Eleanor Sparling Rachel Ren James Sefton William McDermott Jonathan Deacon Benjamin Grinsted Julian West It takes a village to raise an idea…
  • 32.