Invited talk at the health track of ICT.OPEN 2018, 20-3-2018
1. Related Data science challenges to Digital Health trends
2. Designing an infrastructure to support secure learning from distributed health data repositories, for personalized health advice
3. Supporting patients with rare diseases with patient driven research and the generation of new hypotheses based on patient experiences.
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
Personal health data to personalized advice
1. From personal health data to a
personalized advice
Wessel Kraaij
Leiden University & TNO
w.kraaij@liacs.leidenuniv.nl
2. Outline
1. Overview of value based
approaches to health care and
data science
2. Secure federated analysis of
health data
3. Patient Empowerment:
Patient Forum Miner
3. Digital Health
• Improving Health care outcomes using data science
(and AI)
• Includes E-health, M-health etc
• What do we want? simultaneously:
1. Better outcomes
2. Better access to healthcare (best healthcare for
all)
3. Reducing cost
11. Machine readable, reusable data : FAIR
• Findable
• Accessible (under well defined conditions)
• Interoperable: Machine readable
• Reusable
• Secundary goal: combine datasets for
analysis
• FAIR does not imply open!!
Michel Dumontier, Maastricht University
12. GO FAIR
• Internet of FAIR data and services
• FAIR Metrics
• Implementation Networks (IN)
• Example IN: Personal Health Train
13. Responsible
Data Science
• Fairness: avoiding prejudice due to
e.g. biased training data
• Accuracy: quality of information,
prediction , how certain?
• Confidentiality: GDPR, also
commercially confidential
• Transparency: Can we explain
conclusions drawn from big data?
(opening the Black Box)
14. FAIR data (reusability)
Fairness (avoiding bias)
Accurate (incl AI)
Confidential
Transparent (incl AI)
HCI (incl AI)
Communicating with
(a diverse group of) patients
Improving outcomes of health
care system (care, cure,
prevention) using different
types of data
Making choices based on
normalized added health value
per cost unit
Relating data science
challenges to health
care trends
15. 1. Some
conclusions
• Combining / comparing personal data across
individuals is needed for personalized,
preventive health
• Digital Health challenges pose quite relevant
use cases for Data Science, AI and HCI
• Existing costing/business models may actually
be a barrier for better patient value
17. Quantified self
18
bron: MIT
A movement of citizens and
‘makers’ that aim to explore the
possibilities of self-tracking.
Gary Wolf (Wired): “Almost
everything we do generates
data”.
bron: RescueTime
WHAT COULD WE LEARN IF
we could measure and record activities, social context,
environmental context, physiological parameters, food
intake, sleep across our entire lifetime? Now we have
sensors for most of these..
WE CAN LEARN:
Personalized health advice based on systems view and
population data
Integrating evidence based medicine and data driven
predictions
18. The mutual dependence of personal and
population health data
19Bron: cbw.ge en wikimedia.org
Van ‘big’ naar ik
19. Towards personalized e-advice
Model
Personal health profile
(longitudinal data)
Population health profiles
Health professional: duty of confidentiality
For e-services:
How can we avoid that competitors learn the model?
How can we avoid that a health app copies personal data?
20. Health Data Challenges
• Data is organized around treatments
• Silos
• Heterogeneous data
• Patient generated data not linked
• Need patient/citizen centered data
• Undo fragmentation
• Connect person generated data
• Deal with uncertainty and imperfections
• Quantify accuracy
• Missing data
• User centered visualization
21. Some barriers for statistical analysis and ML
• Data is horizontally partitioned
• Distributed learning
• Personal Health Train
• Data is vertically partitioned
• Existing practice: Trusted 3rd party (TTP)
• Health Data Cooperative (e.g. Midata)
• Secure multiparty computation
ID age income sex
1 55 70000 M
2 45 60000 F
ID Age sex
1 55 M
2 45 F
3 20 F
4 22 M
ID age income sex
3 20 25000 F
4 22 20000 M
ID income
1 70000
2 60000
3 25000
4 20000
22. Traditional solution –
trusting a third party
• For research:
• create anonymized/ pseudonymized datasets,
possibly using a trusted third party
• Anonymization: remove identifiers
• Export anonymized dataset for research
(academic/ commercial)
23. Assessment
• The more personal data is combined, the easier it is to re-
identify a profile and the more difficult the anonymization
process is
• E.g. personal well-being data (Fitbit, smartphone apps)
are quite personal and could lead to re-identification
using external data.
• Personal data may seem innocent, but can lead to valuable
insights
• GDPR: Strict regulation on data processing and storage
• Data leaks can lead to substantial fines
• Professionals are reluctant to use big data approaches
• The importance of informed consent.
24. Towards a citizen driven
healthcare economy:
> Citizen’s together form a
Cooperative and a Community
> The cooperative delivers the platform
and governance structure
> Enables individuals to collect their
data (medical and lifestyle)
> Provides services for members and
delivers services to customers
> Data is controlled by citizen and
patients themselves
> Rely on their cooperative
for support
CONTROL
SELF-
MANAGEMENT
HOLLAND HEALTH DATA COOPERATIVE:
TRUST
REWARD
VIRTUAL SAFE
25. Sign the manifesto at:
Personal health train
• Distributed data
• FAIR data stations
• Bring the algorithm to the data
• Approximate global analysis
• FAIR data stations include
• Clinical repositories
• Personal lockers (PGO)
• Umbrella stations (e.g. HDC)
• Trains implement secure workflow for
researchers and patients/citizens
https://www.dtls.nl/phtmanifesto/
26. NWA startimpuls
VWDATA
• Resonsible Data Science
• FACT, FAIR
• Learning from distributed data
• Develop secure algorithms for
learning from a ‘join’ of
databases with
confidential/personal data
28. Build prognostic model using secure regression
H2020 BigMediLytics
• Developing “secure regression” using
secure multiparty computation
methods
• Aim: improve KPI with 20% in large
scale trials
• Erasmus MC Heart Failure pilot
• KPI’s
• Increase medication compliance
• Decrease number of hospital re-
admissions
Ergo
Achmea
claims
EMC clinical
data
Develop and evaluate intervention informed by
the joint longitudinal dataset (based on risk and
cost analysis)
29. 2.
Conclusions
Challenge is to fight fragmentation
(need collaboration, usual model is
competition)
It is crucial to restore the balance in
data governance in order to avoid that
platform companies increasingly control
crucial aspects of our civil society
NL well positioned to take the lead in
building a public infrastructure for
health data analysis.
30. 3. Patient
Forum Miner
An example of collective “empowered” participation: Patient
Platform Sarcomas
32. Rare cancer challenge
• Low incidence: small patient cohorts for
research
• Limited data in national registration
• Need for international cooperation
• Limited budgets for research
• In most hospitals not enough
experience to treat rare cancers: need
for concentration
36. Discussion forums:
● Information on:
○ Development of disease, treatment, side-effects
○ Strategies to cope with side-effects
○ Adherence to therapy
○ Co-morbidity
○ Quality of life
● Also: lots of emotional discussions, support
● Use data mining, NLP, machine learning to extract valuable information
caregiver for mother dx with GIST 12/31/12
31x19x9 cm primary tumor removed 12/12/12
400 mg gleevec 1/15/13 to 12/6/13
Recurrence and removal of 3 masses from
the omentum (3.5 to 4.4 cm) on 12/6/13
Sutent 25 mg. since 1/15/14 37.5 mg. since
8/14 (failed Sutent 6/15)
Sorefinib (naxavar) 7/3/15
more tumors removed 7/13/15
Votrient 9/4/15 to 12/30/15
50 tumors removed 1/16/16’
Wildtype Gist NO Kit or PDFRA mutation, SDH
intact
37. Analysis method
Public
Facebook
Group
4 year
40,000 posts
UMLS tagger
Word2Vec
Categories
Database
UMLS: thesaurus with medical terms
Word2Vec: semantic relationships
Summarisation: machine learning
Elastic search
User interface: relevant posts
Automatic summarisation
39. Examples of findings
Normal dose of imatinib for GIST patients is 400 mg per day.
In case of progression: increased to 800 mg per day.
In one hospital, policy was to advise patients to take full 800 mg at dinner.
Question: would it be better to split the dose?
Based on forum discussion analysis: yes, most patients experience less side
effects when splitting the dose.
Based on result, policy of hospital was changed.
Something else was found: patients reported that taking the medicine with
dark chocolate greatly diminishes nausea.
40. Example: co-morbidity
○ Searching for patients with PDGFRa mutation we found
two patients who mentioned that they also had thyroid
cancer.
○ Searching for this specific combination we found around
20 cases
○ coincidence or is there a relationship?
platelet-derived growth factor receptor α (PDGFRA) mutation, occurs in 10% of GIST cases
41. Example: research agenda
● Patients participate in agenda setting for research in university
hospital
● Longlist of topics was composed on basis of analysis of
(inter)national discussion fora: What is important for patients?
● Longlist was used to consult patients through web survey:
What are your priorities and why?
● 75 patients responded
● 3 topics were selected and are now included in a PhD project
● 1 topic could be addressed on basis of existing knowledge
published in magazine of patient organisation
42. OPTION THEME TOTAL #
PER THEME
1A Surgery of metastases 31
1B RFA 11
1C Embolisation 9
2A Cramps and nausea 8
2B Skin problems 5
2C Fatigue 22
2D Long term side effects of imatinib 30
3A Best way to take imatinib 21
3B Interaction with food 28
4A Last phase of disease 17
4B Micro-gists 8
4C Hereditary aspects 15
4D Combination with other cancers 6
TOTAL 211
43. Further development
● Gaining experience, fine-tuning of software
● Building experience: project with 4 other cancer patient organisations in
the Netherlands, combine with surveys, build experience and improve
algorithms
● Strengthening scientific basis: detecting patterns, quantification,
statistical analysis, validation, filtering by linking to existing knowledge,
clinical testing of hypotheses (PhD project)
44. 3. CONCLUSIONS
● Patient discussion forums can be very valuable,
in particular in case of rare diseases; worldwide
community
● Patient forums can reveal unexpected patterns
and can provide information that would
otherwise never reach doctors
● This project demonstrates the power of
partnership between patients and medical
professionals
45. Finally,..
• Improving Health Care affects us all:
• Data, algorithms can support patients and care
professionals.
• Patient value should be more important than
shareholder value
• Health data is sensitive, people should be in
control regarding access.
• GDPR offers incentives for innovating data
decision support infrastructures
• Explainability
• Protection of personal data
46. References
• P4
• Hood, Leroy, and Stephen H. Friend. 2011. “Predictive, Personalized, Preventive, Participatory (P4) Cancer Medicine.” Nature Reviews Clinical Oncology 8
(3): 184–87. doi:10.1038/nrclinonc.2010.227.
• VBHC
• Porter, Michael E. 2010. “What Is Value in Health Care?” New England Journal of Medicine 363 (26): 2477–81. doi:10.1056/NEJMp1011024.
• FAIR
• GO-FAIR: https://www.go-fair.org/
• Wilkinson, Mark D., Michel Dumontier, Ijsbrand Jan Aalbersberg, Gabrielle Appleton, Myles Axton, Arie Baak, Niklas Blomberg, et al. 2016. “The FAIR
Guiding Principles for Scientific Data Management and Stewardship.” Scientific Data 3 (March): 160018. doi:10.1038/sdata.2016.18.
• FACT
• http://www.responsibledatascience.org/
• NWA VWDATA
• https://wetenschapsagenda.nl/start-vwdata-onderzoeksprogramma/
• Personal Health Train
• Damiani, Andrea, Mauro Vallati, Roberto Gatta, Nicola Dinapoli, Arthur Jochems, Timo Deist, Johan van Soest, Andre Dekker, and Vincenzo Valentini. 2015.
“Distributed Learning to Protect Privacy in Multi-Centric Clinical Studies.” In The 15th Conference on Artificial Intelligence in Medicine, edited by J. H.
Holmes, R. Bellazzi, L. Sacchi, and N. Peek, 65–75. Pavia, Italy: Springer. http://eprints.hud.ac.uk/23905/.
• Secure multiparty computation
• Meilof VEENINGEN, Supriyo CHATTERJEA, Anna Zsófia HORVÁTH, Gerald SPINDLER, Eric BOERSMA, Peter van der SPEK, Onno van der GALIËN, Job
GUTTELING, Wessel KRAAIJ, and Thijs VEUGEN. Enabling Analytics on Sensitive Medical Data with Secure Multi-Party Computation. In Proceedings of
Medical Informatics Europe 2018, Gothenburg, 2018.
• Patient forum miner
• Oortmerssen, Gerard van, Stephan Raaijmakers, Maya Sappelli, Erik Boertjes, Suzan Verberne, Nicole Walasek, and Wessel Kraaij. 2017. “Analyzing Cancer
Forum Discussions with Text Mining.” In Proceedings of Second International Workshop on Extraction and Processing of Rich Semantics from Medical
Texts. Vienna.