Data Integrity in
Decentralized Clinical Trials
(DCTs)
Clifton Chow, PhD
HEOR Consultant
Actu-Real, Inc.
Pierre Etienne, MD
Co-Founder & CMO
Actu-Real, Inc.
Daniel Gutierrez, PhD
Chitra Lele, PhD
Founder and President
Actu-Real, Inc.
Director, Customer Solutions
Clinerion Ltd
Copyright 2022. All Rights Reserved. Contact Presenter for Permission
Centralized Monitoring for
DCTs
Chitra Lele, PhD
Actu-Real, Inc.
Founder and President
chitra.lele@actu-real.com
Decentralized Trials (DCTs)
Benefits of DCTs
4
Benefits of DCTs
ØFor sponsors:
ØImproved recruitment speed
ØReduction in time and cost
ØBetter outcomes: increased patient
retention and diversity
ØFor patients:
ØPatient friendly enrollment and
participation
ØBetter patient experience
ØImproved retention leading to better
outcomes
Digital Enablers of DCTs across
the clinical continuum
ØElectronic consent (eConsent)
ØTelehealth visits
ØData capture at source
ØElectronic patient reported outcomes
(ePRO)
ØElectronic clinical outcome assessment
(eCOA)
ØWearables, sensors
ØRemote monitoring
ØPatient engagement platforms
Data Challenges and Opportunities
5
Data Challenges
ØHigh volume and heterogenous data
ØWidely disparate data sources – dynamic and not
following data standards
ØSecurity risks
ØData monitoring challenges
Opportunities
ØAccess to data amenable to advanced analytics
ØDetect patterns and explore novel endpoints
Ø A large majority of trials projected to use Digital Health Technologies in the next few years
Ø Capture both objective (e.g., sensors/ wearables) and subjective (e.g., ePRO) data streams that complement each other
and significantly enhance evidence generation
Ø The depth and breadth of data from all these sources require monitoring and data management methods outside the
traditional data cleaning and reconciliation activities.
Digital data in DCTs and Data Cleaning
Centralized Monitoring and Data Management
Centralized Monitoring
7
Traditional monitoring (retrospective SDV)
not possible
Data is collected at source or is patient-
reported
Centralized monitoring;
Holistic data surveillance approach
Advanced statistical and analytical tools to
identify data gaps, anomalies
Identify study issues more serious than those
identified by transactional data reviews
Data aggregation for optimized approach to
centralized monitoring
Aggregation of data from siloed and disparate
systems
Unified picture of the patient journey, site
performance, and overall trial health
Centralized monitoring strategies built on
unified data platforms can help in real-time
aggregation and analysis of data, systemic
risk monitoring and early issue detection
provide insights to a variety of functional
teams across the trial continuum
Centralized
Monitoring
Centralized Data Management
Ø Traditional data management approach of data review on CRFs and edit checks is not relevant
Ø Centralized data management and verification is required
Ø Real-time and proactive
Ø Scalable
Ø Data checks at point of capture
Ø Continuous oversight of data integration, flow, and quality
Ø Risk-based data management strategies
Ø Risk: due to collection and aggregation of data across disparate sources and modalities
Ø The focus is on core processes and critical data points most likely to impact data integrity and interpretability
8
Monitoring and Data Management are not as distinct in the centralized approach
Centralized Monitoring Considerations: non-CRF data
Ø eCOA and electronic patient-reported outcomes (ePRO): challenge to ensure patients are compliant with completing
questionnaires and diaries
Ø Need to ensure consistency between patient-reported data and data collected during procedures or assessed by the
physician, i.e., merging subjective and objective outcomes
Ø Comparison of data during on-site visits vs data collected remotely during home-health visits
Ø To detect misconduct: system audits, geo tags
Ø Lag/variability in data flow: is there a signal?
Ø How to define non-compliance in case of data streaming?
Ø Higher risk of fraudulent data/misconduct in DCTs with data from ePROs, wearables etc?
9
Centralized Statistical Monitoring (CSM)
CSM
Ø Centralized Statistical Monitoring: Centralized Monitoring + Statistical Monitoring
Ø Statistical monitoring:
Ø Complex statistical algorithms recommended by TransCelerate to discover data outliers and anomalies, to inform
monitoring, escalation or communication actions
Ø Used on all data and all variables that influence data quality
Ø Relies on the highly structured nature of data; each protocol is expected to be implemented consistently at all sites
Ø The multivariate structure and/or time dependence of variables are sensitive to deviations and hard to copy
Ø Fabricated data, even if plausible from a univariate perspective, are likely to exhibit abnormal multivariate patterns
that are detectable statistically
Ø Bayesian CSM
Ø Bayesian finite mixture models (FMM) used to model patient outcome values of both atypical and typical sites.
Ø Assuming that majority of the sites are ‘normal’, the ‘body distribution’ is determined such that it has the largest
mixture parameter value of finite mixture models.
Ø Atypical sites are detected based on the posterior predictive distribution of normal site's outcome values derived
from only the chosen body distribution.
11
CSM
12
Benefits
Data quality checks on all trial centers at the subject and site level
1. statistical analysis of the data in real time to identify sites that need further investigation due to
unusual data patterns; analysis of site characteristics to define poorly performing sites
2. identification of data trends not easily detected on-site like data consistency and accuracy or missing
data
3. remote verification of critical source data
Tangible benefits and assistance in improving RBM strategies
Early identification of anomalies in data, with use of latest technologies and complex statistical
algorithms, provide an opportunity to address issues as they are uncovered, reduce the risk of regulatory
submission failure
Improves data integrity of a clinical trial
CSM method can be used as an oversight tool for the CROs
TransCelerate Methods and Experiment
Statistical Monitoring
14
Descriptive statistics to identify outliers or influential observations
Comparison of Study-wise and subject-specific confidence intervals with multiplicity-adjusted alpha
SD analysis – repeated measures model for SD estimates
Correlations – subject, site & study level correlations between variables; compared using CIs
Pearson correlation analysis between key variables to check if data copied across sites
1-way ANOVA to compare groups; Chi-square Goodness of fit test to check difference in frequency
distributions
Mahalanobis distance (MD): multidimensional risk assessment method based on multidimensional
risk score; flexible method to combine dimensions and identify risk factors; inliers, outliers
Fisher’s exact/Chi-square test, descriptive stat for Digit preference, rounding, Carry-over effect,
repeated values
Methods
Transcelerate Biopharma’s experiment: Methodology
15
Vital sign evaluations at each visit, hence selected SBP, DBP and pulse
rate
FEV1 and FVC selected because the data collected serially as well as
produced mechanistically and evaluated centrally, thus significantly
reducing the chance for human error
Selected data were deleted and replaced with fabricated plausible
data
Tested statistical monitoring on a data set from a chronic obstructive pulmonary disease (COPD)
clinical study with 178 sites and 1554 subjects. Fabricated data selectively implanted in 7 sites and 43
subjects by expert clinicians in COPD; Data set partitioned to simulate studies of different sizes
Reference:
Knepper et al, Statistical Monitoring in Clinical Trials: Best Practices for Detecting Data Anomalies Suggestive of Fabrication or Misconduct, Therapeutic Innovation &
Regulatory Science 2016, Vol. 50(2) 144-154
Transcelerate Biopharma’s experiment - Analysis
Ø MD for each subject & studywide was calculated separately for vital signs and spirometry measurements.
Ø A large MD for a subject would correspond to an outlier, and a small MD would correspond to an inlier.
Ø Carryover effect/repeated values
Ø Carryover, defined as an exact match of a value for a subject from one visit to the next, was calculated.
Ø Repeated values, defined as the number of identical values for a subject within a visit (for spirometry) or overall,
were calculated.
Ø Digit preference and rounding
Ø For each subject, last digit frequency distribution was compared to other subjects at that site and across the study
using either a Chi-square or Fisher exact test, as appropriate, and by comparing the mean and SD of last digit value
within a subject to studywide distributions using the CIs.
16
Transcelerate Biopharma’s experiment - Results
Ø Results
Ø The algorithm identified 11 sites (19%), 19 sites (31%), 28 sites (16%), and 45 sites (25%) as having potentially
fabricated data for studies 2A, 2, 1A, and 1, respectively.
Ø For study 2A, 3 of 7 sites with fabricated data were detected, 5 of 7 were detected for studies 2 and 1A, and 6 of 7
for study 1.
Ø Except for study 2A, the algorithm had good sensitivity and specificity (>70%) for identifying sites with fabricated
data.
Ø Conclusions
Ø Recommendation of a cross-functional, collaborative approach to statistical monitoring that can adapt to study
design and data source and use a combination of statistical screening techniques and confirmatory graphics.
17
Limitation and Future Trend
Limitation
Ø The effectiveness of statistical monitoring is questionable in low–data volume conditions
Ø A site may not be flagged if it is small (small sample size)
Ø A small site may be flagged because of a single subject, increasing false-positive rates
Ø An adaptive algorithm that uses different cut-offs for subject and site flagging depending on the stage and size of the
study may be required
The success of Central monitoring activities relies on the skills of people, well-defined processes, and technologies that
enable the translation of data into information, and resulting actions and decisions.
Future Trend
Ø Increased use of AI and ML for pattern/anomaly identification, review using data visualization and automated statistical
analyses
18
Thank You
Copyright 2022. All Rights Reserved. Contact Presenter for Permission
Clinerion: A Global Interoperable
Patient Network for Trial
Feasibility & Recruitment
Daniel Gutierrez, PhD
Director
Clinerion Ltd
Customer Solutions
daniel.gutierrez@clinerion.com
21
Clinerion Ltd. CONFIDENTIAL
Clinerion Global Research Network
Argentina
ꟷ Instituto de Diagnóstico e
Investigaciones Metabólicas
(IDIM)
TOTAL CONTRACTED:
264.3 M Patients
@ 300+ Sites
LIVE
(online, in real-time):
248.8 M
Patients
Switzerland
ꟷ University Hospital Basel
ꟷ Privatklinikgruppe Hirslanden
(17 clinics)
USA
ꟷ MEDICARE/MEDICAID claims
data.
Colombia
ꟷ Hospital Pablo Tobón Uribe
ꟷ Cliníca FOSCAL Internacional
Uruguay
ꟷ CASMU
Taiwan R.O.C.
ꟷ Show Chwan Hospital
Turkey
ꟷ Istanbul University (5
hospitals)
ꟷ Malatya İnönü University
ꟷ Konya Necmettin Erbakan
University
ꟷ Çekmece Mehmet Akif Public
Hospital
ꟷ Baskent University (8
hospitals)
ꟷ Karadeniz Technical University
- Medical Faculty
ꟷ Ege University Medical Faculty
Hospital
South Korea
ꟷ Korea University Medical
Center
United Arab Emirates
ꟷ UAE Claims data
ꟷ Cleveland Clinic
India
ꟷ Jehangir Clinical Development
Centre
ꟷ Pawana Hospital
Saudi Arabia
ꟷ E1 Cluster, Dammam ((22
hospitals, incl.:
ꟷ King Fahad Specialist
Hospital
ꟷ Maternity and Children
Hospital
ꟷ Dammam Medical
Complex
ꟷ Qatif
ꟷ Jubail General Hospital
ꟷ Alkhafji Hospital
ꟷ King Abdullah International
Medical Research Center
(KAIMRC) (5 hospitals)
Bulgaria
ꟷ DCC Ascendent
Romania
ꟷ Pediatrics Hospital Louis
Turcanu Timisoara
France
ꟷ Centre Hospitalier De Troyes
Spain
ꟷ Hospitales de Madrid (17
hospitals), incl.:
ꟷ HM Madrid
ꟷ HM Montepríncipe
ꟷ HM Torrelodones
ꟷ HM Sanchinarro
ꟷ HM Nuevo Belén
ꟷ HM Puerta del Sur
ꟷ HM Vallés
ꟷ HM San Francisco
ꟷ HM Regla
ꟷ HM Modelo
ꟷ HM Belén
ꟷ HM La Esperanza
ꟷ HM Rosaleda
ꟷ HM Vigo
ꟷ HM Nou Delfos
ꟷ HM Sant Jordi
ꟷ HM Nens
Greece
ꟷ Papageorgiou Hospital
ꟷ Athens Medical Group
ꟷ Hygeia Hospital Athens
ꟷ Theageneio Anticancer
Hospital
Estonia
ꟷ Tartu University Hospital
ꟷ North Estonia Medical Center
Armenia
ꟷ Arabkir Hospital
Hungary
ꟷ Svabhegy
Gyermekgyógyintézet
ꟷ CityZen Health Centre
ꟷ Multiklinika
ꟷ Kastelypark Clinic
ꟷ University of Debrecen
United Kingdom
ꟷ Wellbeing primary care
network (>4,000 sites)
Georgia
ꟷ Georgian-Dutch Hospital
ꟷ Tbilisi Heart and Vascular
Center
ꟷ Medison clinics
ꟷ EVEX Hospitals (52 hospitals)
Poland
ꟷ Szpital Specjalistyczny w
Brzozowie
ꟷ EMC MEDICAL INSTITUTE SA
(Penta Hospitals) (16 sites)
Serbia
ꟷ KBC Zvezdara
ꟷ Oncology and Radiology
Institute of Serbia
ꟷ Clinical Center Serbia
Croatia
ꟷ Clinical Hospital Dubrava
ꟷ University Hospital Centre
Zagreb
ꟷ Clinical Hospital Center Rijeka
Ukraine
ꟷ TerraLab
ꟷ NSC M.D. Strazhesko Institute
of Cardiology
Brazil
ꟷ Hospital São Vicente
ꟷ Santa Casa de Misericórdia de
Porto Alegre
ꟷ Felício Rocho
ꟷ Erasto Gaertner
ꟷ Instituto de Câncer Dr Arnaldo
ꟷ Hospital Angelina Caron
ꟷ Hospital Ernesto Dornelles
ꟷ Hospital PUC-Campinas
ꟷ Hospital Português de Bahia
ꟷ Fundação Amaral Carvalho
ꟷ A.C. Camargo Cancer Center
ꟷ Hospital Regional do Oeste
ꟷ Hospital São Lucas da PUCRS
ꟷ Hospital Martagão Gesteira
ꟷ Hospital Ana Nery
ꟷ CRIO-Centro Regional
Integrado de Oncologia
ꟷ Hospital Infantil Sabará
ꟷ Hospital Estadual da Criança -
HEC
ꟷ MedRadius
ꟷ Hospital São Paulo
ꟷ DataSUS National Claims
ꟷ CAPED
ꟷ Grupo NotreDame Intermédica
(GNDI)
ꟷ Santa Casa Belo Horizonte
22
Clinerion Ltd. CONFIDENTIAL
Diagnosis
Snomed
ICD-9
ICD-10
Procedures
Loinc
HCPCS
Snomed
SUT
Demographics
Mernis
Age
Sex Lab
Tests HL7
Loinc
CDA SUT
Barcode Snomed
Medication
ATC
NDC
Dynamic Healthcare Codes Interoperability
PNEx, our recently patented solution,
transforms global data sources to a single
queryable data model.
Our ontology system allows for
multisource semantic querying to make
data interoperable and to standardized
terminologies for all coded clinical data.
23
Clinerion Ltd. CONFIDENTIAL
Query Designer: coding a protocol with I/E criteria
Define Demographics.
Find patient by main disease OR disease sub-type. Introduce
time-dependent longitudinal patient search.
Restrict by Lab Test Results (e.g. Tumor Markers) and sub-
elements (e.g. CA125 etc).
Aggregated patient findings, yet useful for decision making.
PNEx semantic querying helps exclude patients conveniently
once computed against patients found in inclusion criteria.
24
Clinerion Ltd. CONFIDENTIAL
Query Designer
Federated Hospital Search of Medical Terminologies
25
Clinerion Ltd. CONFIDENTIAL
Query Designer
Query Results from Medical Terminologies
26
Clinerion Ltd. CONFIDENTIAL
Patient Finder
A Patient Identification Tool (only for Hospitals)
27
Clinerion Ltd. CONFIDENTIAL
Patient‘s Record Viewer (Inside the Hospital)
28
Clinerion Ltd. CONFIDENTIAL
Record Viewer behind the Firewall: secure and confidential
Authorized hospital staff retrieve
records corresponding to patients
that matched the query’s (I/E
criteria).
29
Clinerion Ltd. CONFIDENTIAL
A Look Inside Clinerion Technology
How does Clinerion enable external parties (e.g. pharma) to
query sites and ask questions, while both preserving patient
confidentiality and security for the site?
30
Clinerion Ltd. CONFIDENTIAL
Data Flow within Hospital
CLINERION SERVER
HOSPITAL INFORMATION SYSTEM
ETL SERVER
(works for all connectors: i2b2, FHIR, HL7, HIS proprietary
connectors, ETL applications …)
DE-IDENTIFIED PATIENT DATA
HOSPITAL IT
INFRASTRUCTURE
SECURE PRIVATE CLINERION
CLOUD
REPORTS
AGGREGATED DATA POINTS
QUERY
HOSPITAL
FIREWALL
Patient data does not leave the hospital
31
Clinerion Ltd. CONFIDENTIAL
Ongoing Developments at Clinerion: AI/ML
AGGREGATED DATA POINTS
MODEL RESULT
HOSPITAL IT INFRASTRUCTURE
No inbound connection ports from Network Cloud opened
HOSPITAL
FIREWALL
HOSPITAL
FIREWALL
NETWORK
FIREWALL
NETWORK SERVER
NETWORK
FIREWALL
HOSPITAL
INFORMATION
SYSTEM (HIS)
QUERY
SECURE PRIVATE NETWORK CLOUD
REPORTS,
query
results
FULL COPY
PATIENT DATA DOES NOT LEAVE THE HOSPITAL
ETL SERVER
Secure environment encrypted server
(Works for all connectors: i2b2, FHIR, HL7, HIS proprietary connectors, ETL applications …)
DE-IDENTIFIED PATIENT DATA
SEARCH INDEX PATIENT RECORDS
Federated Data Model
NEW
FEDERATED MACHINE LEARNING SERVER
Hardware included GPU (Nvidia preferred),
software stack to execute machine learning
projects, API to patient records
MODEL MERGING
MODEL TRAINING INSTRUCTIONS
NEW:
Federated
Machine
Learning
Projects
ANONYMI-
ZATION
SERVER
CONFIDENTIAL 32
Clinerion Ltd - International Institute for the Safety of Medicines
This document is confidential and is intended solely for the use and information of the persons to whom it is addressed.
Without the consent of Clinerion neither concept nor individual information from this document may be reproduced or passed on to third parties.
Thank you
daniel.gutierrez@clinerion.com
www.clinerion.com
Copyright 2022. All Rights Reserved. Contact Presenter for Permission
The Expanding Role of
Mobile Healthcare Providers
in DCTs
Pierre Etienne, MD
Actu-Real, Inc.
Co-Founder & CMO
pierre.etienne@actu-real.com
DCTs benefits
Ø Fewer trial sites
Ø Reduced number of IRBs
Ø Reduced number of resubmissions
Ø Reduced variability
Ø Potential improved compliance
Ø Potential increase in study safety
Ø Outcomes more closely reflective of the real-world environment
Ø Improvement of trial access
34
Mobile HCPs contributions to DCTs
Removal of obstacles / Increased participation
Ø Trial duration
Ø Frequency of visits
Ø Disease state
Ø Distance to investigative site
Ø Travel plans (snowbirds)
35
Data integrity definition (FDA) 2016
Ø Complete
Ø Consistent
Ø Accurate
Ø Attributable
Ø Legible
Ø Contemporaneously recorded
Ø Original or true copy
36
Traditional mobile HCPs role
Ø Blood draws
Ø Clinical assessments
Ø IMP or treatment administration
Ø Participant education (not IT)
Ø In-home compliance check
Source: Clinical Trials Transformation Initiative (CTTI)
37
Future mobile HCPs role
Ø Spokesperson for the sponsor / trialist
Ø Blood draws
Ø Clinical assessments
Ø Re-assessment of participant understanding of ePRO instrument
Ø IT technical support
Ø Project management
Ø Social work
Ø Deep structure edical imaging?
38
Portable deep structure medical imaging
Ø Already used by paramedical personnel
Ø At home use ?
Ø Imaging of organs?
Ø Secure transmission
Ø Interpretation by licensed radiologist
39
Future training needs for HCPs working in DCTs
Ø GCP
Ø Human participant protection
Ø Data protection
Ø Trial-specific requirements
Ø IT troubleshooting
Ø Medical imaging?
Ø Affordable to small trialists?
40
Copyright 2022. All Rights Reserved. Contact Presenter for Permission
Data Integrity Detection
Using Artificial Intelligence
Clifton Chow, PhD
Actu-Real, Inc.
HEOR Consultant
clifton.chow@actu-real.com
Machine Learning Methodology
ØMachine Learning (ML) techniques are
used to assess the integrity/acceptability
of ECG signals from wearable devices.
ØML algorithms were trained and tested on
the Physionet/Cinc 2017 challenge
training dataset.
ØThis Dataset is a good representation of
the data obtained from a wearable device.
42
Performance of Algorithms
• The study tested 13
machine learning
algorithms
• This presentation will
focus on the top three
performing algorithms
that predict ECG signal
integrity
43
Bagging Using Neural Networks-Simulation
ØBootstrap Aggregation/Bagging (B)-Ensemble
method where neural network models are
generated by sampling with replacement at
random.
ØThis random sampling process found three 3 NN
models that exhibited excellent results.
ØThe figure shows the model (neural network) that
exhibited the best performance in predicting ECG
signal with a 99.47% accuracy. 44
Gradient Boosting Algorithm-Processing
ØEnsemble technique in which a sequence of
variables is constructed additively to find the
best prediction.
ØAt each iteration a variable is added and accuracy
is checked by noting how many ECG raw values it
failed to predict.
ØThe iteration that minimizes false predictions
(circles and squares) is then adopted for Decision
Trees (next slide). 45
Gradient Boosting using Decision Trees
ØA decision tree (DT) is a learning method in
which important features are selected from
regression models.
ØPredicted values are assigned T(1)/F(0) (Figure
6) if it matches the raw data. The DT ensemble
method predicted ECG signal at 98.92% and
95.25% accuracy
ØDT requires minimal computing memory and
can classify ECG signals with lightning speed. 46
Evaluating Costs through Computation Complexity
ØThe bagging and gradient boosting models
both require 99 multiplication and 90-91
addition computations.
ØThe total energy required to execute the
computations is lowest (0.039)-both Bagging
and Gradient Boosting can be implemented
on wearable technology at a low cost.
47
Conclusion
ØThe neural network ensemble using
bagging and gradient boosting (encircled as
B2 and GB2 box plots in the figure) exhibit
the best integrity classification across all
parameters.
ØComputer chips with Energy-efficient deep
neural networks have the potential to be
deployed on IoT wearable devices for
reliable ECG signal quality. 48
References
49
• John, Arlene & Panicker, Rajesh & Cardiff, Barry & Lian, Yong &
John, Deepu. (2020). Binary Classifiers for Data Integrity Detection
in Wearable IoT Edge Devices. IEEE Open Journal of Circuits and
Systems. 1. 88-99. 10.1109/OJCAS.2020.3009520.
• Exploring the clinical features of narcolepsy type 1 versus
narcolepsy type 2 from European Narcolepsy Network database
with machine learning - Scientific Figure on ResearchGate. Available
from: https://www.researchgate.net/figure/A-simple-example-of-
visualizing-gradient-boosting_fig5_326379229 [accessed 3 May,
2022]
Thank You
Thank you for participating!
CLICK HERE to learn more and
watch the webinar

Data Integrity in Decentralized Clinical Trials (DCTs)

  • 1.
    Data Integrity in DecentralizedClinical Trials (DCTs) Clifton Chow, PhD HEOR Consultant Actu-Real, Inc. Pierre Etienne, MD Co-Founder & CMO Actu-Real, Inc. Daniel Gutierrez, PhD Chitra Lele, PhD Founder and President Actu-Real, Inc. Director, Customer Solutions Clinerion Ltd
  • 2.
    Copyright 2022. AllRights Reserved. Contact Presenter for Permission Centralized Monitoring for DCTs Chitra Lele, PhD Actu-Real, Inc. Founder and President chitra.lele@actu-real.com
  • 3.
  • 4.
    Benefits of DCTs 4 Benefitsof DCTs ØFor sponsors: ØImproved recruitment speed ØReduction in time and cost ØBetter outcomes: increased patient retention and diversity ØFor patients: ØPatient friendly enrollment and participation ØBetter patient experience ØImproved retention leading to better outcomes Digital Enablers of DCTs across the clinical continuum ØElectronic consent (eConsent) ØTelehealth visits ØData capture at source ØElectronic patient reported outcomes (ePRO) ØElectronic clinical outcome assessment (eCOA) ØWearables, sensors ØRemote monitoring ØPatient engagement platforms
  • 5.
    Data Challenges andOpportunities 5 Data Challenges ØHigh volume and heterogenous data ØWidely disparate data sources – dynamic and not following data standards ØSecurity risks ØData monitoring challenges Opportunities ØAccess to data amenable to advanced analytics ØDetect patterns and explore novel endpoints Ø A large majority of trials projected to use Digital Health Technologies in the next few years Ø Capture both objective (e.g., sensors/ wearables) and subjective (e.g., ePRO) data streams that complement each other and significantly enhance evidence generation Ø The depth and breadth of data from all these sources require monitoring and data management methods outside the traditional data cleaning and reconciliation activities. Digital data in DCTs and Data Cleaning
  • 6.
  • 7.
    Centralized Monitoring 7 Traditional monitoring(retrospective SDV) not possible Data is collected at source or is patient- reported Centralized monitoring; Holistic data surveillance approach Advanced statistical and analytical tools to identify data gaps, anomalies Identify study issues more serious than those identified by transactional data reviews Data aggregation for optimized approach to centralized monitoring Aggregation of data from siloed and disparate systems Unified picture of the patient journey, site performance, and overall trial health Centralized monitoring strategies built on unified data platforms can help in real-time aggregation and analysis of data, systemic risk monitoring and early issue detection provide insights to a variety of functional teams across the trial continuum Centralized Monitoring
  • 8.
    Centralized Data Management ØTraditional data management approach of data review on CRFs and edit checks is not relevant Ø Centralized data management and verification is required Ø Real-time and proactive Ø Scalable Ø Data checks at point of capture Ø Continuous oversight of data integration, flow, and quality Ø Risk-based data management strategies Ø Risk: due to collection and aggregation of data across disparate sources and modalities Ø The focus is on core processes and critical data points most likely to impact data integrity and interpretability 8 Monitoring and Data Management are not as distinct in the centralized approach
  • 9.
    Centralized Monitoring Considerations:non-CRF data Ø eCOA and electronic patient-reported outcomes (ePRO): challenge to ensure patients are compliant with completing questionnaires and diaries Ø Need to ensure consistency between patient-reported data and data collected during procedures or assessed by the physician, i.e., merging subjective and objective outcomes Ø Comparison of data during on-site visits vs data collected remotely during home-health visits Ø To detect misconduct: system audits, geo tags Ø Lag/variability in data flow: is there a signal? Ø How to define non-compliance in case of data streaming? Ø Higher risk of fraudulent data/misconduct in DCTs with data from ePROs, wearables etc? 9
  • 10.
  • 11.
    CSM Ø Centralized StatisticalMonitoring: Centralized Monitoring + Statistical Monitoring Ø Statistical monitoring: Ø Complex statistical algorithms recommended by TransCelerate to discover data outliers and anomalies, to inform monitoring, escalation or communication actions Ø Used on all data and all variables that influence data quality Ø Relies on the highly structured nature of data; each protocol is expected to be implemented consistently at all sites Ø The multivariate structure and/or time dependence of variables are sensitive to deviations and hard to copy Ø Fabricated data, even if plausible from a univariate perspective, are likely to exhibit abnormal multivariate patterns that are detectable statistically Ø Bayesian CSM Ø Bayesian finite mixture models (FMM) used to model patient outcome values of both atypical and typical sites. Ø Assuming that majority of the sites are ‘normal’, the ‘body distribution’ is determined such that it has the largest mixture parameter value of finite mixture models. Ø Atypical sites are detected based on the posterior predictive distribution of normal site's outcome values derived from only the chosen body distribution. 11
  • 12.
    CSM 12 Benefits Data quality checkson all trial centers at the subject and site level 1. statistical analysis of the data in real time to identify sites that need further investigation due to unusual data patterns; analysis of site characteristics to define poorly performing sites 2. identification of data trends not easily detected on-site like data consistency and accuracy or missing data 3. remote verification of critical source data Tangible benefits and assistance in improving RBM strategies Early identification of anomalies in data, with use of latest technologies and complex statistical algorithms, provide an opportunity to address issues as they are uncovered, reduce the risk of regulatory submission failure Improves data integrity of a clinical trial CSM method can be used as an oversight tool for the CROs
  • 13.
  • 14.
    Statistical Monitoring 14 Descriptive statisticsto identify outliers or influential observations Comparison of Study-wise and subject-specific confidence intervals with multiplicity-adjusted alpha SD analysis – repeated measures model for SD estimates Correlations – subject, site & study level correlations between variables; compared using CIs Pearson correlation analysis between key variables to check if data copied across sites 1-way ANOVA to compare groups; Chi-square Goodness of fit test to check difference in frequency distributions Mahalanobis distance (MD): multidimensional risk assessment method based on multidimensional risk score; flexible method to combine dimensions and identify risk factors; inliers, outliers Fisher’s exact/Chi-square test, descriptive stat for Digit preference, rounding, Carry-over effect, repeated values Methods
  • 15.
    Transcelerate Biopharma’s experiment:Methodology 15 Vital sign evaluations at each visit, hence selected SBP, DBP and pulse rate FEV1 and FVC selected because the data collected serially as well as produced mechanistically and evaluated centrally, thus significantly reducing the chance for human error Selected data were deleted and replaced with fabricated plausible data Tested statistical monitoring on a data set from a chronic obstructive pulmonary disease (COPD) clinical study with 178 sites and 1554 subjects. Fabricated data selectively implanted in 7 sites and 43 subjects by expert clinicians in COPD; Data set partitioned to simulate studies of different sizes Reference: Knepper et al, Statistical Monitoring in Clinical Trials: Best Practices for Detecting Data Anomalies Suggestive of Fabrication or Misconduct, Therapeutic Innovation & Regulatory Science 2016, Vol. 50(2) 144-154
  • 16.
    Transcelerate Biopharma’s experiment- Analysis Ø MD for each subject & studywide was calculated separately for vital signs and spirometry measurements. Ø A large MD for a subject would correspond to an outlier, and a small MD would correspond to an inlier. Ø Carryover effect/repeated values Ø Carryover, defined as an exact match of a value for a subject from one visit to the next, was calculated. Ø Repeated values, defined as the number of identical values for a subject within a visit (for spirometry) or overall, were calculated. Ø Digit preference and rounding Ø For each subject, last digit frequency distribution was compared to other subjects at that site and across the study using either a Chi-square or Fisher exact test, as appropriate, and by comparing the mean and SD of last digit value within a subject to studywide distributions using the CIs. 16
  • 17.
    Transcelerate Biopharma’s experiment- Results Ø Results Ø The algorithm identified 11 sites (19%), 19 sites (31%), 28 sites (16%), and 45 sites (25%) as having potentially fabricated data for studies 2A, 2, 1A, and 1, respectively. Ø For study 2A, 3 of 7 sites with fabricated data were detected, 5 of 7 were detected for studies 2 and 1A, and 6 of 7 for study 1. Ø Except for study 2A, the algorithm had good sensitivity and specificity (>70%) for identifying sites with fabricated data. Ø Conclusions Ø Recommendation of a cross-functional, collaborative approach to statistical monitoring that can adapt to study design and data source and use a combination of statistical screening techniques and confirmatory graphics. 17
  • 18.
    Limitation and FutureTrend Limitation Ø The effectiveness of statistical monitoring is questionable in low–data volume conditions Ø A site may not be flagged if it is small (small sample size) Ø A small site may be flagged because of a single subject, increasing false-positive rates Ø An adaptive algorithm that uses different cut-offs for subject and site flagging depending on the stage and size of the study may be required The success of Central monitoring activities relies on the skills of people, well-defined processes, and technologies that enable the translation of data into information, and resulting actions and decisions. Future Trend Ø Increased use of AI and ML for pattern/anomaly identification, review using data visualization and automated statistical analyses 18
  • 19.
  • 20.
    Copyright 2022. AllRights Reserved. Contact Presenter for Permission Clinerion: A Global Interoperable Patient Network for Trial Feasibility & Recruitment Daniel Gutierrez, PhD Director Clinerion Ltd Customer Solutions daniel.gutierrez@clinerion.com
  • 21.
    21 Clinerion Ltd. CONFIDENTIAL ClinerionGlobal Research Network Argentina ꟷ Instituto de Diagnóstico e Investigaciones Metabólicas (IDIM) TOTAL CONTRACTED: 264.3 M Patients @ 300+ Sites LIVE (online, in real-time): 248.8 M Patients Switzerland ꟷ University Hospital Basel ꟷ Privatklinikgruppe Hirslanden (17 clinics) USA ꟷ MEDICARE/MEDICAID claims data. Colombia ꟷ Hospital Pablo Tobón Uribe ꟷ Cliníca FOSCAL Internacional Uruguay ꟷ CASMU Taiwan R.O.C. ꟷ Show Chwan Hospital Turkey ꟷ Istanbul University (5 hospitals) ꟷ Malatya İnönü University ꟷ Konya Necmettin Erbakan University ꟷ Çekmece Mehmet Akif Public Hospital ꟷ Baskent University (8 hospitals) ꟷ Karadeniz Technical University - Medical Faculty ꟷ Ege University Medical Faculty Hospital South Korea ꟷ Korea University Medical Center United Arab Emirates ꟷ UAE Claims data ꟷ Cleveland Clinic India ꟷ Jehangir Clinical Development Centre ꟷ Pawana Hospital Saudi Arabia ꟷ E1 Cluster, Dammam ((22 hospitals, incl.: ꟷ King Fahad Specialist Hospital ꟷ Maternity and Children Hospital ꟷ Dammam Medical Complex ꟷ Qatif ꟷ Jubail General Hospital ꟷ Alkhafji Hospital ꟷ King Abdullah International Medical Research Center (KAIMRC) (5 hospitals) Bulgaria ꟷ DCC Ascendent Romania ꟷ Pediatrics Hospital Louis Turcanu Timisoara France ꟷ Centre Hospitalier De Troyes Spain ꟷ Hospitales de Madrid (17 hospitals), incl.: ꟷ HM Madrid ꟷ HM Montepríncipe ꟷ HM Torrelodones ꟷ HM Sanchinarro ꟷ HM Nuevo Belén ꟷ HM Puerta del Sur ꟷ HM Vallés ꟷ HM San Francisco ꟷ HM Regla ꟷ HM Modelo ꟷ HM Belén ꟷ HM La Esperanza ꟷ HM Rosaleda ꟷ HM Vigo ꟷ HM Nou Delfos ꟷ HM Sant Jordi ꟷ HM Nens Greece ꟷ Papageorgiou Hospital ꟷ Athens Medical Group ꟷ Hygeia Hospital Athens ꟷ Theageneio Anticancer Hospital Estonia ꟷ Tartu University Hospital ꟷ North Estonia Medical Center Armenia ꟷ Arabkir Hospital Hungary ꟷ Svabhegy Gyermekgyógyintézet ꟷ CityZen Health Centre ꟷ Multiklinika ꟷ Kastelypark Clinic ꟷ University of Debrecen United Kingdom ꟷ Wellbeing primary care network (>4,000 sites) Georgia ꟷ Georgian-Dutch Hospital ꟷ Tbilisi Heart and Vascular Center ꟷ Medison clinics ꟷ EVEX Hospitals (52 hospitals) Poland ꟷ Szpital Specjalistyczny w Brzozowie ꟷ EMC MEDICAL INSTITUTE SA (Penta Hospitals) (16 sites) Serbia ꟷ KBC Zvezdara ꟷ Oncology and Radiology Institute of Serbia ꟷ Clinical Center Serbia Croatia ꟷ Clinical Hospital Dubrava ꟷ University Hospital Centre Zagreb ꟷ Clinical Hospital Center Rijeka Ukraine ꟷ TerraLab ꟷ NSC M.D. Strazhesko Institute of Cardiology Brazil ꟷ Hospital São Vicente ꟷ Santa Casa de Misericórdia de Porto Alegre ꟷ Felício Rocho ꟷ Erasto Gaertner ꟷ Instituto de Câncer Dr Arnaldo ꟷ Hospital Angelina Caron ꟷ Hospital Ernesto Dornelles ꟷ Hospital PUC-Campinas ꟷ Hospital Português de Bahia ꟷ Fundação Amaral Carvalho ꟷ A.C. Camargo Cancer Center ꟷ Hospital Regional do Oeste ꟷ Hospital São Lucas da PUCRS ꟷ Hospital Martagão Gesteira ꟷ Hospital Ana Nery ꟷ CRIO-Centro Regional Integrado de Oncologia ꟷ Hospital Infantil Sabará ꟷ Hospital Estadual da Criança - HEC ꟷ MedRadius ꟷ Hospital São Paulo ꟷ DataSUS National Claims ꟷ CAPED ꟷ Grupo NotreDame Intermédica (GNDI) ꟷ Santa Casa Belo Horizonte
  • 22.
    22 Clinerion Ltd. CONFIDENTIAL Diagnosis Snomed ICD-9 ICD-10 Procedures Loinc HCPCS Snomed SUT Demographics Mernis Age SexLab Tests HL7 Loinc CDA SUT Barcode Snomed Medication ATC NDC Dynamic Healthcare Codes Interoperability PNEx, our recently patented solution, transforms global data sources to a single queryable data model. Our ontology system allows for multisource semantic querying to make data interoperable and to standardized terminologies for all coded clinical data.
  • 23.
    23 Clinerion Ltd. CONFIDENTIAL QueryDesigner: coding a protocol with I/E criteria Define Demographics. Find patient by main disease OR disease sub-type. Introduce time-dependent longitudinal patient search. Restrict by Lab Test Results (e.g. Tumor Markers) and sub- elements (e.g. CA125 etc). Aggregated patient findings, yet useful for decision making. PNEx semantic querying helps exclude patients conveniently once computed against patients found in inclusion criteria.
  • 24.
    24 Clinerion Ltd. CONFIDENTIAL QueryDesigner Federated Hospital Search of Medical Terminologies
  • 25.
    25 Clinerion Ltd. CONFIDENTIAL QueryDesigner Query Results from Medical Terminologies
  • 26.
    26 Clinerion Ltd. CONFIDENTIAL PatientFinder A Patient Identification Tool (only for Hospitals)
  • 27.
    27 Clinerion Ltd. CONFIDENTIAL Patient‘sRecord Viewer (Inside the Hospital)
  • 28.
    28 Clinerion Ltd. CONFIDENTIAL RecordViewer behind the Firewall: secure and confidential Authorized hospital staff retrieve records corresponding to patients that matched the query’s (I/E criteria).
  • 29.
    29 Clinerion Ltd. CONFIDENTIAL ALook Inside Clinerion Technology How does Clinerion enable external parties (e.g. pharma) to query sites and ask questions, while both preserving patient confidentiality and security for the site?
  • 30.
    30 Clinerion Ltd. CONFIDENTIAL DataFlow within Hospital CLINERION SERVER HOSPITAL INFORMATION SYSTEM ETL SERVER (works for all connectors: i2b2, FHIR, HL7, HIS proprietary connectors, ETL applications …) DE-IDENTIFIED PATIENT DATA HOSPITAL IT INFRASTRUCTURE SECURE PRIVATE CLINERION CLOUD REPORTS AGGREGATED DATA POINTS QUERY HOSPITAL FIREWALL Patient data does not leave the hospital
  • 31.
    31 Clinerion Ltd. CONFIDENTIAL OngoingDevelopments at Clinerion: AI/ML AGGREGATED DATA POINTS MODEL RESULT HOSPITAL IT INFRASTRUCTURE No inbound connection ports from Network Cloud opened HOSPITAL FIREWALL HOSPITAL FIREWALL NETWORK FIREWALL NETWORK SERVER NETWORK FIREWALL HOSPITAL INFORMATION SYSTEM (HIS) QUERY SECURE PRIVATE NETWORK CLOUD REPORTS, query results FULL COPY PATIENT DATA DOES NOT LEAVE THE HOSPITAL ETL SERVER Secure environment encrypted server (Works for all connectors: i2b2, FHIR, HL7, HIS proprietary connectors, ETL applications …) DE-IDENTIFIED PATIENT DATA SEARCH INDEX PATIENT RECORDS Federated Data Model NEW FEDERATED MACHINE LEARNING SERVER Hardware included GPU (Nvidia preferred), software stack to execute machine learning projects, API to patient records MODEL MERGING MODEL TRAINING INSTRUCTIONS NEW: Federated Machine Learning Projects ANONYMI- ZATION SERVER
  • 32.
    CONFIDENTIAL 32 Clinerion Ltd- International Institute for the Safety of Medicines This document is confidential and is intended solely for the use and information of the persons to whom it is addressed. Without the consent of Clinerion neither concept nor individual information from this document may be reproduced or passed on to third parties. Thank you daniel.gutierrez@clinerion.com www.clinerion.com
  • 33.
    Copyright 2022. AllRights Reserved. Contact Presenter for Permission The Expanding Role of Mobile Healthcare Providers in DCTs Pierre Etienne, MD Actu-Real, Inc. Co-Founder & CMO pierre.etienne@actu-real.com
  • 34.
    DCTs benefits Ø Fewertrial sites Ø Reduced number of IRBs Ø Reduced number of resubmissions Ø Reduced variability Ø Potential improved compliance Ø Potential increase in study safety Ø Outcomes more closely reflective of the real-world environment Ø Improvement of trial access 34
  • 35.
    Mobile HCPs contributionsto DCTs Removal of obstacles / Increased participation Ø Trial duration Ø Frequency of visits Ø Disease state Ø Distance to investigative site Ø Travel plans (snowbirds) 35
  • 36.
    Data integrity definition(FDA) 2016 Ø Complete Ø Consistent Ø Accurate Ø Attributable Ø Legible Ø Contemporaneously recorded Ø Original or true copy 36
  • 37.
    Traditional mobile HCPsrole Ø Blood draws Ø Clinical assessments Ø IMP or treatment administration Ø Participant education (not IT) Ø In-home compliance check Source: Clinical Trials Transformation Initiative (CTTI) 37
  • 38.
    Future mobile HCPsrole Ø Spokesperson for the sponsor / trialist Ø Blood draws Ø Clinical assessments Ø Re-assessment of participant understanding of ePRO instrument Ø IT technical support Ø Project management Ø Social work Ø Deep structure edical imaging? 38
  • 39.
    Portable deep structuremedical imaging Ø Already used by paramedical personnel Ø At home use ? Ø Imaging of organs? Ø Secure transmission Ø Interpretation by licensed radiologist 39
  • 40.
    Future training needsfor HCPs working in DCTs Ø GCP Ø Human participant protection Ø Data protection Ø Trial-specific requirements Ø IT troubleshooting Ø Medical imaging? Ø Affordable to small trialists? 40
  • 41.
    Copyright 2022. AllRights Reserved. Contact Presenter for Permission Data Integrity Detection Using Artificial Intelligence Clifton Chow, PhD Actu-Real, Inc. HEOR Consultant clifton.chow@actu-real.com
  • 42.
    Machine Learning Methodology ØMachineLearning (ML) techniques are used to assess the integrity/acceptability of ECG signals from wearable devices. ØML algorithms were trained and tested on the Physionet/Cinc 2017 challenge training dataset. ØThis Dataset is a good representation of the data obtained from a wearable device. 42
  • 43.
    Performance of Algorithms •The study tested 13 machine learning algorithms • This presentation will focus on the top three performing algorithms that predict ECG signal integrity 43
  • 44.
    Bagging Using NeuralNetworks-Simulation ØBootstrap Aggregation/Bagging (B)-Ensemble method where neural network models are generated by sampling with replacement at random. ØThis random sampling process found three 3 NN models that exhibited excellent results. ØThe figure shows the model (neural network) that exhibited the best performance in predicting ECG signal with a 99.47% accuracy. 44
  • 45.
    Gradient Boosting Algorithm-Processing ØEnsembletechnique in which a sequence of variables is constructed additively to find the best prediction. ØAt each iteration a variable is added and accuracy is checked by noting how many ECG raw values it failed to predict. ØThe iteration that minimizes false predictions (circles and squares) is then adopted for Decision Trees (next slide). 45
  • 46.
    Gradient Boosting usingDecision Trees ØA decision tree (DT) is a learning method in which important features are selected from regression models. ØPredicted values are assigned T(1)/F(0) (Figure 6) if it matches the raw data. The DT ensemble method predicted ECG signal at 98.92% and 95.25% accuracy ØDT requires minimal computing memory and can classify ECG signals with lightning speed. 46
  • 47.
    Evaluating Costs throughComputation Complexity ØThe bagging and gradient boosting models both require 99 multiplication and 90-91 addition computations. ØThe total energy required to execute the computations is lowest (0.039)-both Bagging and Gradient Boosting can be implemented on wearable technology at a low cost. 47
  • 48.
    Conclusion ØThe neural networkensemble using bagging and gradient boosting (encircled as B2 and GB2 box plots in the figure) exhibit the best integrity classification across all parameters. ØComputer chips with Energy-efficient deep neural networks have the potential to be deployed on IoT wearable devices for reliable ECG signal quality. 48
  • 49.
    References 49 • John, Arlene& Panicker, Rajesh & Cardiff, Barry & Lian, Yong & John, Deepu. (2020). Binary Classifiers for Data Integrity Detection in Wearable IoT Edge Devices. IEEE Open Journal of Circuits and Systems. 1. 88-99. 10.1109/OJCAS.2020.3009520. • Exploring the clinical features of narcolepsy type 1 versus narcolepsy type 2 from European Narcolepsy Network database with machine learning - Scientific Figure on ResearchGate. Available from: https://www.researchgate.net/figure/A-simple-example-of- visualizing-gradient-boosting_fig5_326379229 [accessed 3 May, 2022]
  • 50.
  • 51.
    Thank you forparticipating! CLICK HERE to learn more and watch the webinar