How AI Can Help Anonymize Clinical Trial Data

A human-in-loop analytics driven
approach for anonymization &
redaction of clinical data submissions
Ganes Kesari
August 23, ‘22

Ganes Kesari
Co-founder & Chief Decision Scientist
“Simplify Data Science for all”
100+ Clients
Solve business problems using insights
and stories on a low-code platform
@kesaritweets
/gkesari

Ø New regulatory requirements such as EMA
0070 and Health Canada PRCI
Ø Specific data privacy guidelines for all reports
published in the public domain
Ø These regulations also call for a level of data
transparency
Ø Prescribe sufficient data granularity to help
the scientific community
Anonymizing Clinical Study Reports (CSR) with the right balance of Privacy
and Transparency is challenging
Privacy
Transparency

Anonymization of CSRs is a three-fold problem
Human Errors
Ø Time consuming & cumbersome process with many complex steps
Ø Error prone and requires multiple manual reviews
Unstructured Content
Ø No plug & play, off-the shelf
Named Entity Recog. models
Ø Requires pharma domain
specific entities
Regulatory Constraints
Ø Complex and rapidly evolving
regulations
Ø High quality thresholds with
stringent re-identification
thresholds

1. Regulatory Constraints: Increasing regulations spike compliance costs
and the likely penalties for breaches
Typical annual Spends of $2-3 Mn on
external vendor costs
Typical cost of a healthcare breach
was $9.2 Mn per incident
84% increase in healthcare data
breaches, impacting 45 million people
Tech advances & public data spike re-
identification risks
IBM Report - Cost of a Data breach

2. Human Errors: Clinical teams go through a long and cumbersome
process for CSR anonymization
Time consuming
processes with cycle
times up to 45 days
for each summary
document
25+ complex steps in
achieving
anonymization using
different clinical trial
management
systems
Higher potential for
error with data
flowing across
multiple internal
systems, databases
and emails for reviews
and approvals

Anonymization
Data anonymization is the process of transforming
information by removing or encrypting sensitive
data (PII or PHI), in order to protect data subjects’
privacy and confidentiality
NLP
Process of transforming and
understanding human language to
identify meaningful patterns and
new insights
3. Unstructured Content: Advanced analytics and Natural Language
Processing is needed to extract PII entities with high accuracy
Anonymization Techniques:
Ø Character Masking
Ø Pseudonymisation
Ø Generalization
Ø Swapping
Ø Data perturbation etc
NLP Techniques:
ØInformation Retrieval
ØNatural Language Processing
ØInformation extraction (NER-
Named entity recognition)

The Solution: A measured approach which balances human validation
and judgement with analytics and automation
Ø User-centered solution design
Ø Collaborative workflows with user feedback
Ø Leveraged open-source tech
Ø Custom algorithm training for
better domain understanding
Ø Domain experts helped tailor
algorithms for unstructured data
Ø Strong & scalable solution
capabilities basis past experience
Ø Regulatory & research
community help understand
required quality thresholds
Ø Iterative optimization till the
desired EMA and Health
Canada controls were met
Human-in-the Loop
Advanced Analytics Regulatory Compliance

Unstructured Data Transformation
The Anonymization Solution handles structured and unstructured data
with iterative risk scoring to ensure compliance
CSR
documents
Reference population
(data on similar trials)
Parsing
CSR docs
Entity
recognition
Sampling
for users to
validate
Recall
calculation
Structured data
transformation
Iterative risk scoring
and optimization
algorithm
Final risk
adjusted CSR
document
User Input

What did we learn from implementing such solutions for clients?
Be prepared to tackle a variety of input data sources
in terms of document structure, style, and entities
Typical CSRs contain 100+ tables and figures which
need to be treated as independent problems
Paucity of research on the risk of reidentification and
patient privacy in pharma clinical space

Where are we headed? Solutions must be geared for more attacks,
tightening regulations, and regional variations
Data breaches have become
more easy
World regulations are evolving
& norms are being tightened
Region specific variations of
regulations are emerging

Please share your session feedback!
@kesaritweets
/gkesari
Ganes Kesari
www.gramener.com
Thank You!

How AI Can Help Anonymize Clinical Trial Data

Recommended

Recommended

More Related Content

Similar to How AI Can Help Anonymize Clinical Trial Data

Similar to How AI Can Help Anonymize Clinical Trial Data (20)

More from Ganes Kesari

More from Ganes Kesari (20)

Recently uploaded

Recently uploaded (20)

How AI Can Help Anonymize Clinical Trial Data