The Zero-ETL Approach: Enhancing Data Agility and Insight
Quantitative Image Analysis in the Life Sciences: Using imaging to predict cancer survival
1. Quantitative Image Analysis
in the Life Sciences
Using imaging to predict cancer survival
Simon Li
University of Oxford
1
2. Aim
Ewings Sarcoma is a very rare cancer affecting mainly children,
with a survival rate of around 65%.
We are interested in finding biomarkers which are either correlated
with survival, or can help select the optimal treatment.
Why?
Tradition analysis of microscope images is a time-consuming
manual process.
Automated analyses are consistent, repeatable, less prone to
human bias, and can uncover subtle effects.
What's novel?
Translating basic laboratory research into a clinically useful tool.
Single cell analysis: Most studies average over the entire sample,
so information from individual cells is lost.
2/9
3. Developing the tools
Cells grown in tissue culture were used to develop the
image segmentation and analysis algorithms.
Each set of cells was stimulated with a different dose of drug
and imaged to observed the change in appearance.
3/9
4. Segmentation
Novel multi-phase level set and random walker segmentation
algorithms developed to identify the boundaries of each cell in a
tightly packed clump.
Classification
Image features (including intensities,
shapes, etc) can be obtained from
each cell.
Approximately 50 features/cell, 400
Leave- cells/image.
one-out
cross- Machine learning (Random Forests)
validation trained to classify cells and images
based on these features
4/9
5. Tumour biopsies
Can we apply our algorithms to real tumour biopsies?
The data
Approximately 100 biopsies of varying quality, along with patient
survival (alive / time to death) and other clinical indicators.
Sources of biopsies vary, and there is a large variation in quality.
5/9
6. Metastasis prediction
We can find several features using the
Random Forest feature importances that
are predictive of metastasis (spreading of
the cancer away from the original site).
Visualised as dotted lines to the left of the
corresponding solid lines. A/B/C are
separate sources of data.
6/9
7. Survival
prediction
Finally can we predict
survival times?
This is made more
complicated by incomplete
data (surviving patients do
not have a time of death),
so use Random Survival
Forests.
We can identify features
which appear to be
correlated with survival.
7/9
8. But ….
There is a potential
problem. Using multi-
dimensional scaling
(from the random
Forest proximity
matrix) we can see
that the three datasets
are partially separated.
This means that the
normalisation
procedure has failed to
remove all systematic
errors in the data.
These errors are most
likely due to variations
in the protocol carried
out by different
experimenters.
8/9
9. Summary
Developed novel segmentation
algorithms for handling clumped
cells.
Carried out initial work in a new area
of research- the responses of cell
clumps.
Built a framework for integrating
single cell imaging data with analysis
using machine learning.
Identified problems related to lack of
data normalisation.
PhD supervisors
Prof J Alison Noble
Dr James G Wakefield
Contact: Simon Li
someone@pitpe.co.uk
9/9