Book Paid Powai Call Girls Mumbai 𖠋 9930245274 𖠋Low Budget Full Independent H...
Improving health care outcomes with responsible data science #escience2018
1. 31-10-2018
1
Improving health care outcomes with
responsible data science
Wessel Kraaij
Leiden University & TNO
w.kraaij@liacs.leidenuniv.nl
Outline
1. Overview of value based
approaches to health care and
data science
2. Secure federated analysis of
health data
3. Patient Empowerment:
Patient Forum Miner
2. 31-10-2018
2
Digital Health
• Improving Health care outcomes using data science
(and AI)
• Includes E-health, M-health etc
• What do we want? simultaneously:
1. Better outcomes
2. Better access to healthcare (best healthcare for all)
3. Control cost
Typical machine learning challenges in health / life
sciences: diagnosis, progression, prognosis
4. 31-10-2018
4
Precision
medicine,
Personalized
health, vitality
Prediction, prognosis based on
personal profiles
Reasoning/decision making with
incomplete information
Needs lots of data, data science
methods, trusted infrastructure
Early
diagnosis and
prevention
Integration of multiple weak biomarkers
Anomaly detection
Self measurement
Avoid costly acute care
..BUT, this hurts current business cases
5. 31-10-2018
5
Increasingly
stronger role
for citizens
and patients
Participatory research
Shared decision making
Patient centered
PROM’s
Late effects
Tailoring in
information
access,
communication,
coaching
Segmentation
Avoid one size fits all
Sociodemographic differences
E.g. chronic diseases, elderly, low
literacy
6. 31-10-2018
6
Value Based
Health Care
𝑉𝑎𝑙𝑢𝑒 =
𝑃𝑎𝑡𝑖𝑒𝑛𝑡 𝑟𝑒𝑝𝑜𝑟𝑡𝑒𝑑 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑠
𝐶𝑜𝑠𝑡 + 𝐵𝑢𝑟𝑑𝑒𝑛
Objective function
Michael Porter
Machine readable, reusable data : FAIR
• Findable
• Accessible (under well defined conditions)
• Interoperable: Machine readable
• Reusable
• FAIR is a precondition to combine datasets
for analysis
• FAIR does not imply open!!
Michel Dumontier, Maastricht University
7. 31-10-2018
7
GO FAIR
• Internet of FAIR data and services
• FAIR Metrics
• Implementation Networks (IN)
• Example IN: Personal Health Train
Responsible
Data Science
• Fairness: avoiding prejudice due to
e.g. biased training data
• Accuracy: quality of information,
prediction , how certain?
• Confidentiality: GDPR, also
commercially confidential
• Transparency: Can we explain
conclusions drawn from big data?
(opening the Black Box)
8. 31-10-2018
8
FAIR data (reusability)
Fairness (avoiding bias)
Accuracy (incl AI)
Confidentiality
Transparency (incl AI)
HCI (incl AI)
Communicating with
(a diverse group of) patients
Improving outcomes of health
care system (care, cure,
prevention) using different
types of data
Making choices based on
normalized added health value
per cost unit
Relating data science
challenges to health
care trends
1. Some
conclusions
• Combining / comparing personal data across
individuals is needed for personalized,
preventive health
• Digital Health challenges pose quite relevant
use cases for Data Science, AI and HCI
• Existing costing/business models may actually
be a barrier for better patient value
9. 31-10-2018
9
2. Secure
federated
analysis of
distributed
personal data
Quantified self
19
bron: MIT
A movement of citizens and
‘makers’ that aim to explore the
possibilities of self-tracking.
Gary Wolf (Wired): “Almost
everything we do generates
data”.
bron: RescueTime
WHAT COULD WE LEARN IF
we could measure and record activities, social context,
environmental context, physiological parameters, food
intake, sleep across our entire lifetime? (sometimes called
Exposome). Now we have sensors for most of these..
WE CAN LEARN:
Personalized health advice based on systems view and
population data
Integrating evidence based medicine and data driven
predictions
10. 31-10-2018
10
The mutual dependence of personal and
population health data
20Bron: cbw.ge en wikimedia.org
Van ‘big’ naar ik
Towards personalized e-health advice
Model
Personal health profile
(longitudinal data)
Population health profiles
Health professional: duty of confidentiality
For e-services:
How can we avoid that competitors learn the model?
How can we avoid that a health app copies personal data?
11. 31-10-2018
11
Health Data Challenges
• Data is organized around treatments
• Silos
• Heterogeneous data
• Patient generated data not linked
• Need patient/citizen centered data
• Undo fragmentation
• Link person generated data
• Deal with uncertainty and imperfections
• Quantify accuracy
• Missing data
Some barriers for statistical analysis and ML
• Data is horizontally partitioned
• Distributed learning
• Personal Health Train
• Data is vertically partitioned
• Existing practice: Trusted 3rd party (TTP)
• Health Data Cooperative (e.g. Midata)
• Secure multiparty computation
ID age GP visits gender
1 55 70 M
2 45 60 F
ID Age gender
1 55 M
2 45 F
3 20 F
4 22 M
ID age GP visits gender
3 20 25 F
4 22 10 M
ID GP visits
1 70
2 60
3 25
4 10
12. 31-10-2018
12
Sign the manifesto at:
Personal health train
• Distributed data
• FAIR data stations
• Bring the algorithm to the data
• Approximate global analysis
• FAIR data stations include
• Clinical repositories
• Personal lockers (PGO)
• Umbrella stations (e.g. HDC)
• Trains implement secure workflow for
researchers and patients/citizens
https://www.dtls.nl/phtmanifesto/
NWA startimpuls
VWDATA
• Resonsible Data Science
• FACT, FAIR
• Learning from distributed data
• Develop secure algorithms for
learning from a ‘join’ of
databases with
confidential/personal data
13. 31-10-2018
13
Build prognostic model using secure regression
H2020 BigMediLytics
• Developing “secure regression” using
secure multiparty computation
methods
• Aim: improve KPI with 20% in large
scale trials
• Erasmus MC Heart Failure pilot
• KPI’s
• Increase medication compliance
• Decrease number of hospital re-
admissions
Ergo
Achmea
claims
EMC clinical
data
Develop and evaluate intervention informed by
the joint longitudinal dataset (based on risk and
cost analysis)
Proposed solution: Secure Multi party
computation
• Several data owners intend to perform to ‘learn’ from the combination
of the datasets
• However, they cannot release their data
• SMPC enables to overcome this barrier:
• learning without disclosing
• Extensive communication protocol between partners
• Proven non disclosure
• TNO/CWI/Philips are developing and evaluating MPC methods
• homomorphic encryption
• garbled circuits
• secret sharing
14. 31-10-2018
14
Simple Example: Secure summation using
SMPC
F(n1)=R+n1
F(n5)-R
F(n2)=F(n1)+n2
F(n3)=F(n2)+n3F(n4)=F(n3)+n4
F(n5)=F(n4)+n5
Σ =
Assessment
• Cryptographic SMPC protocols have been around for decades
• Applicability was hampered by inefficiency of several protocols
• Recent advances in protocols and computing power, increased
attention for privacy have resumed interest in development of SMPC
• MPC has the potential to become a crucial enabler for improving
health and health care by learning from data
15. 31-10-2018
15
2.
Conclusions
Challenge is to undo fragmentation
(need collaboration, usual model is
competition)
It is crucial to restore the balance in
data governance in order to avoid that
platform companies increasingly control
crucial aspects of our civil society
NL well positioned to take the lead in
building a public infrastructure for
health data analysis.
3. Patient
Forum Miner
An example of collective “empowered” participation: Patient
Platform Sarcomas
16. 31-10-2018
16
Sarcoma: a rare cancer
Rare cancer challenge
• Low incidence: small patient cohorts for
research
• Limited data in national registries
• Need for international cooperation
• Limited budgets for research
• In most hospitals there is no sufficient
experience to treat rare cancers: need
for concentration
18. 31-10-2018
18
Open Facebook group
GIST Support International
Discussion forums:
● Information on:
○ Development of disease, treatment, side-effects
○ Strategies to cope with side-effects
○ Adherence to therapy
○ Co-morbidity
○ Quality of life
● Also: lots of emotional discussions, support
● Use data mining, NLP, machine learning to extract valuable information
caregiverfor mother dx with GIST 12/31/12
31x19x9 cm primary tumor removed12/12/12
400 mg gleevec 1/15/13 to 12/6/13
Recurrence and removalof 3 massesfrom
the omentum (3.5 to 4.4 cm) on 12/6/13
Sutent 25 mg. since 1/15/14 37.5 mg. since
8/14 (failed Sutent 6/15)
Sorefinib (naxavar)7/3/15
more tumors removed7/13/15
Votrient 9/4/15 to 12/30/15
50 tumors removed1/16/16’
Wildtype Gist NO Kit or PDFRAmutation, SDH
intact
19. 31-10-2018
19
Analysis method
Public
Facebook
Group
4 year
40,000 posts
UMLS tagger
Word2Vec
Categories
Database
UMLS: thesaurus with medical terms
Word2Vec: semantic relationships
Summarisation: machine learning
Elastic search
User interface: relevant posts
Automatic summarisation
User interface for textmining of forums
20. 31-10-2018
20
Examples of findings
Normal dose of imatinib for GIST patients is 400 mg per day.
In case of progression: increased to 800 mg per day.
In one hospital, policy was to advise patients to take full 800 mg at dinner.
Question: would it be better to split the dose?
Based on forum discussion analysis: yes, most patients experience less side
effects when splitting the dose.
Based on result, policy of hospital was changed.
Something else was found: patients reported that taking the medicine with
dark chocolate greatly diminishes nausea.
Example: co-morbidity
○ Searching for patients with PDGFRa mutation we found
two patients who mentioned that they also had thyroid
cancer.
○ Searching for this specific combination we found around
20 cases
○ coincidence or is there a relationship?
platelet-derived growth factor receptor α (PDGFRA) mutation, occurs in 10% of GIST cases
21. 31-10-2018
21
Further development
● Building experience: project with 4 other cancer patient organisations in
the Netherlands, combine with surveys, build experience and improve
algorithms
● Strengthening scientific basis: detecting patterns, quantification,
statistical analysis, validation, filtering by linking to existing knowledge,
clinical testing of hypotheses (PhD project)
Patient forums are knowledge gold mine
Poster Anne Dirkson
48
• Patients can offer information that biomedical articles and clinical records cannot
provide: their experiences
• When aggregated, could be used to :
• Allow patients to learn from each other’s experiences higher quality of life
• Generate hypotheses for clinical research
22. 31-10-2018
22
Open Knowledge Discovery and Data
Mining from Patient forums
To develop a methodology for mining patient experiences on health
forums for novel knowledge
Anecdotal evidence to trustworthy knowledge
49
50
-
-
=
Forum
PubMed articles
ClinicalTrial.gov
Novel knowledge
Concept normalization using
custom lexicons and domain
specific spelling correction
Entity extraction using conditional
random fields. Features include
context, semantic, syntactic and
word embeddings, mapping to
UMLS
23. 31-10-2018
23
3. CONCLUSIONS
● Patient experience discussion forums can be very
valuable, in particular in case of rare diseases;
worldwide community
● Patient forums can reveal unexpected patterns
and can provide information that would
otherwise never reach doctors
○ Care must be taken to avoid experimenter bias
● This project demonstrates the power of
partnership between patients and medical
professionals
Finally,..
• Improving Health Care affects us all:
• Data, algorithms can support patients and care
professionals.
• Patient value should be more important than
shareholder value
• Health data is sensitive, people should be in
control regarding access.
• GDPR stimulates innovation in data science
• Explainability
• Protection of personal data
24. 31-10-2018
24
References
• P4
• Hood, Leroy, and Stephen H. Friend. 2011. “Predictive, Personalized, Preventive, Participatory (P4) Cancer Medicine.” Nature Reviews Clinical Oncology 8
(3): 184–87. doi:10.1038/nrclinonc.2010.227.
• VBHC
• Porter, Michael E. 2010. “What Is Value in Health Care?” New England Journal of Medicine 363 (26): 2477–81. doi:10.1056/NEJMp1011024.
• FAIR
• GO-FAIR: https://www.go-fair.org/
• Wilkinson, Mark D., Michel Dumontier, Ijsbrand Jan Aalbersberg, Gabrielle Appleton, Myles Axton, Arie Baak, Niklas Blomberg, et al. 2016. “The FAIR
Guiding Principles for Scientific Data Management and Stewardship.” Scientific Data 3 (March): 160018. doi:10.1038/sdata.2016.18.
• FACT
• http://www.responsibledatascience.org/
• NWA VWDATA
• https://wetenschapsagenda.nl/start-vwdata-onderzoeksprogramma/
• Personal Health Train
• Damiani, Andrea, Mauro Vallati, Roberto Gatta, Nicola Dinapoli, Arthur Jochems, Timo Deist, Johan van Soest, Andre Dekker, and Vincenzo Valentini. 2015.
“Distributed Learning to Protect Privacy in Multi-Centric Clinical Studies.” In The 15th Conference on Artificial Intelligence in Medicine, edited by J. H.
Holmes, R. Bellazzi, L. Sacchi, and N. Peek, 65–75. Pavia, Italy: Springer. http://eprints.hud.ac.uk/23905/.
• Secure multiparty computation
• Meilof VEENINGEN, Supriyo CHATTERJEA, Anna Zsófia HORVÁTH, Gerald SPINDLER, Eric BOERSMA, Peter van der SPEK, Onno van der GALIËN, Job
GUTTELING, Wessel KRAAIJ, and Thijs VEUGEN. Enabling Analytics on Sensitive Medical Data with Secure Multi-Party Computation. In Proceedings of
Medical Informatics Europe 2018, Gothenburg, 2018.
• Patient forum miner
• Oortmerssen, Gerard van, Stephan Raaijmakers, Maya Sappelli, Erik Boertjes, Suzan Verberne, Nicole Walasek, and Wessel Kraaij. 2017. “Analyzing Cancer
Forum Discussions with Text Mining.” In Proceedings of Second International Workshop on Extraction and Processing of Rich Semantics from Medical
Texts. Vienna.