Call Girl Raipur 9873940964 Book Hot And Sexy Girls
Dekker trog - learning outcome prediction models from cancer data - 2017
1. Learning outcome prediction models
from cancer data
Andre Dekker
Department of Radiation Oncology (MAASTRO)
GROW - Maastricht University Medical Centre +
Maastricht,The Netherlands
SLIDES AVAILABLE ON SLIDESHARE
(slideshare.net/AndreDekker)
2. 2
Disclosures
• Research collaborations incl. funding and speaker honoraria
– Varian (VATE, SAGE, ROO, chinaCAT, euroCAT), Siemens (euroCAT), Sohard (SeDI,
CloudAtlas), Mirada Medical (CloudAtlas), Philips (EURECA,TraIT, SWIFT-RT, BIONIC),
Xerox (EURECA), De Praktijkindex (DLRA), ptTheragnostic (DART, Strategy), CZ (My
BestTreatment)
• Public research funding
– Radiomics (USA-NIH/U01CA143062), euroCAT(EU-Interreg), duCAT&Strategy (NL-
STW), EURECA (EU-FP7), SeDI & CloudAtlas & DART (EU-EUROSTARS),TraIT (NL-
CTMM), DLRA (NL-NVRO), BIONIC (NWO)
• Spin-offs and commercial ventures
– MAASTRO Innovations B.V. (CSO)
– Various patents on medical machine learning
3. 3
TROG 2017 talks
• Learning outcome prediction models from cancer data
– Technical ResearchWorkshop, Monday 840-910, followed by
Panel Discussion
• Big Data in Radiation Oncology
– Statistical Methods, Evidence Appraisal and Research for
Trainees, Monday 1450-1520
• Knowledge Engineering in Oncology
– TROG Plenary,Tuesday, 925-1000
• Radiomics for Oncology
– TROG Plenary,Thursday, 1150-1220
Some
Overlap
No
Overlap
4. 4
Learning objectives
After the lecture, attendees should be able to
• Name the major sources of cancer data and their absolute and relative size
• Understand the challenges of sharing data and solutions to these
• Itemize steps in the methodology to go from data to models
• Appraise papers that describe models incl. usingTRIPOD
7. 7
Barriers to sharing data
[..] the problem is not really technical […]. Rather, the problems
are ethical, political, and administrative.
Lancet Oncol 2011;12:933
1. Administrative (I don’t have the resources)
2. Political (I don’t want to)
3. Ethical (I am not allowed to)
4. Technical (I can’t)
8. 8
Common approaches to sharing
• Sharing standardized, highly curated data from clinical
research programs
• Very useful, but only 3% of patients (if that)
• Sharing standardized, highly curated data to clinical
registries
• Very useful, but limited amount of features and a lot of
work
• Big Data companies usually cloud based (Watson
Health Cloud, Flatiron/Google, ASCO/SAP CancerLinq)
• Worries about privacy, loss of control, limited
reusability, silos
9. 9
Data landscape
• Clinical research
• 3% of patients
• 100% of features
• 5% missing
• 285 data points
• Clinical registries
• 100% of patients
• 3% of features
• 20% missing
• 240 data points
• Clinical routine
• 100% of patients
• 100% of features
• 80% missing
• 2000 data points
Data elements
Patients
10. 10
A different approach
• If sharing is the problem: Don’t share the data
• If you can’t bring the data to the research
• You have to bring the research to the data
• Challenges
– The research application has to be distributed (trains & track)
– The data has to be understandable by an application (i.e. not a human) -> FAIR data stations
12. 12
Typical Data Quality challenges
• Data are unstructured
• Data are not understandable
• Data are missing
• Data are incorrect
• Data are contradicting
• Data are biased
• Data are biased missing
• Garbage in – Garbage out?
声门下区
T4N0M0 Stage IV patient
Patient weighing 1000kg
Grade 3+ toxicities
16. 16
A bit more technical detail
• Keep data locally
• Standardize it according to
an ontology
• Make and send around
learning “bots”
• Share the results - not the
data!
17. 17
Even more technical details
• De-identification
• Semantic web, linked data
• Imaging/DICOM data & clinical data stream
20. 20
How much data do you need?
• Rule of thumb. Min. 10 events per input feature
• 200 NSCLC patients
• 25% survival at two years
• 50 events
• 10 input features
• More is better Source: vitalflux.com (2017)
22. 22
Considerations for machine learning
• Discrimination (AUC)
• Calibration (Brier)
• Interpretability (black box vs. transparent)
• Can it handle low data quality (of training and validation)?
• Can it be learned in a distributed setting?
23. 23
Choose already
Simple and quick, but need complete data
• Logistic regression
• SupportVector Machines
Intuitive and can handle missing data
• Bayesian Networks
All can be learned in a distributed setting
Review pending
25. 25
Validation model
• Discrimination: Is the model able to classify the population into two
or more groups with different observed survival?
• Calibration: Is the estimated probability of survival equal to the
observed survival probability?
• Clinical usefulness: Is the data on which the data is based
representative for my patient and is the predicted outcome clinically
relevant for my patient?
27. 27
Discrimination / Calibration / Clinical Relevance?
• Discrimination: Is the model able to classify the population into two or more groups
with different observed survival?
• Calibration: Is the estimated probability of survival equal to the observed survival
probability?
• Clinical usefulness: Is the data on which the data is based representative for my patient
and is the predicted outcome clinically relevant for my patient?
28. 28
Discrimination / Calibration / Clinical Relevance?
• Discrimination: Is the model able to classify the population into two or more groups
with different observed survival?
• Calibration: Is the estimated probability of survival equal to the observed survival
probability?
• Clinical usefulness: Is the data on which the data is based representative for my patient
and is the predicted outcome clinically relevant for my patient?
29. 29
Discrimination / Calibration / Clinical Relevance?
• Discrimination: Is the model able to classify the population into two or more groups
with different observed survival?
• Calibration: Is the estimated probability of survival equal to the observed survival
probability?
• Clinical usefulness: Is the data on which the data is based representative for my patient
and is the predicted outcome clinically relevant for my patient?
31. 31
Learning objectives
After the lecture, attendees should be able to
• Name the major sources of cancer data and their absolute and relative size
• Understand the challenges of sharing data and solutions to these
• Itemize steps in the methodology to go from data to models
• Appraise papers that describe models incl. usingTRIPOD
32. 32
Acknowledgements
• Fudan Cancer Center, Shanghai,China
• Varian, PaloAlto, CA, USA
• Siemens, Malvern, PA, USA
• RTOG, Philadelphia, PA, USA
• MAASTRO, Maastricht, Netherlands
• PoliclinicoGemelli, Roma, Italy
• UH Ghent, Belgium
• UZ Leuven, Belgium
• Radboud, Nijmegen, Netherlands
• University of Sydney, Australia
• University of Michigan,Ann Arbor, USA
• Liverpool and MacarthurCC, Australia
• CHU Liege, Belgium
• UniklinikumAachen, Germany
• LOC Genk/Hasselt, Belgium
• Princess Margaret CC, Canada
• The Christie, Manchester, UK
• UH Leuven, Belgium
• State Hospital, Rovigo, Italy
• Illawarra ShoalhavenCC, Australia
• CatharinaZkh Eindhoven, Netherlands
• Philips, Eindhoven, Netherlands
More info on: www.predictcancer.org www.cancerdata.org
www.eurocat.info www.mistir.info
33. Thank you for your attention
Andre Dekker
Department of Radiation Oncology (MAASTRO)
GROW - Maastricht University Medical Centre +
Maastricht,The Netherlands