Artficial Intelligence (AI) studies and designs intelligent agents, e.g. systems that perceive their environment and take actions that maximize the chances of success. In microscopy a success is often understood when the automated image analysis effciently detects phenotypes, as in biological screens, or retrieves a diagnostically relevant statistics from images, as in the medical diagnosis. During the talk I will present two applications aimed at supporting high-content screening and diagnosis of cervical cancer where subdomains of AI, e.g. Evolutionary Algorithm, Neural Networks and Machine Learning techniques were applied.
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
Artificial Intelligence in High Content Screening and Cervical Cancer Diagnosis
1. Artificial Intelligence in High-Content
Screening and Cervical Cancer
Diagnosis
Lukasz Miroslaw , PhD.
lukasz.miroslaw@uzh.ch
Organic Chemistry Institute
Grid Computing Competence Center
University of Zurich, Switzerland
ETH LMC, 17.10.2012
1
2. Table of Contents
Introduction:
- Why Artificial Intelligence?
- Loss-of-function screens.
- Cervical Cancer Diagnosis.
GC3Pie: Software for Workflow-Management in High
Content Screening.
Demo.
Conclusions.
2
3. Why Artificial Intelligence ?
Algorithm design is difficult due
to high number of parameters that
must be estimated.
Some numbers:
Bridge: 52! = (≈8.07×1067) =
80,658,175,170,943,878,571,660,636,856,403,766,
975,289,505,440,883,277,824,000,000,000,000
Chess: 4.52×1046 is a proven upper bound for the
number of legal chess positions.
Cosmology: There are 1024 stars.
The only reasonable approach is to use AI. 3
4. Artificial Intelligence
Search and optimization
Probabilistic methods for uncertain
reasoning
Logic
Languages
Control theory
Neural networks
Classifiers and statistical learning methods
… and more
4
5. esiRNA: Knockdown efficacy
QRT-PCR analyses 24
hours after transfection of
HeLa cells with indicated
esiRNAs are shown.
Because of the complex
mixture of different
siRNAs all targeting the
same mRNA, the esiRNA
pool typically produces
excellent silencing of the
transcripts.
Credits: Prof.
Frank Buchholz
5
6. Objective and Assay Setup
Objective: Automated estimation of Mitotic Index from time-lapse movies.
Segmentation and Classification of three types of HeLa cells: normal,
apoptotic and mitotic cells.
Methodology: Multimodal Image Analysis.
Fig. GFP-tagged HeLa cells imaged with
Positive (Left) and Negative (Right) Phase-
Contrast and Fluorescence Microscopy. TDS
(right) and Kyoto (bottom).
6
7. Cell Model
Mitotic cells: cell boundaries well
distinguishable, rounded shape.
Normal and apoptotic cells have slightly
different level of GFP signal, in metaphase the
signal gets stronger. Fig. Distribution of Features for
TDS HeLa cells.
7
8. Apoptopic and Normal Cells
Detection of GFP signal: local background Classification of detected objects as
subtraction with rolling-ball algorithm [1] apoptotic or normal cells.
followed by watershed.
Fig. Validation: Specificity: Fig. Discriminant Function Analysis, linear vs. quadratic classifiers.
98% measured on 578 cells. Best specificity: 71% measured on 6 randomly picked images from
the training set.
[1] Sternberg S., “Biomedical Image Processing”, IEEE Computer, January 1983.
8
9. Detection of Mitotic Cells
Cross-correlation based approach [3]:
1. Given N cell models gi and target image f:
2. Cross-correlation of f with gi in Fourier
Space, i = 1,…,N
3. Validate correlation peaks.
[3] Miroslaw L., Chorazyczewski A., Correlation-based method for automatic mitotic cell detection in phase contrast
microscopy, Proc. 4th Int. Conf. Computer Recognition Systems CORES'05, pp. 627-635, Springer-Verlag Berlin
Hildelberg 2005.
9
10. Evolution-Driven Validation
No Teaching.
Just one parameter (σ)
Specificity: 81%
[4] Miroslaw L., Chorazyczewski A., Buchholz F., Kittler R., EA validation method in detection of mitotic cells, Proc. 8th
National Conference on Evolutionary Computation and Global Optimization, pp. 157-163, Korbielow 2005.
10
11. Summary
Fig. Estimated Mitotic Index for well-type cells
(blue) and cells with CDC16 being knocked
down.
One of developed methods used in genome-
scale screening.
Genome-scale RNA-
mediated interference screen
in HeLa cells to identify
human genes that are
important for cell division
[5].
Cited 133 times.
[5] Kittler R, Pelletier L, Heninger AK, Slabicki M, Theis M, Miroslaw L, Poser I, Lawo S, Grabner H, Kozak K, Wagner
J, Surendranath V, Richter C, Bowen W, Habermann B, Hyman AA, Buchholz B. (2007) Genome-wide RNAi profiling of
cell cycle progression in human tissue culture cells. Nat Cell Biol. 9(12): 1401-12. 11
12. 5 years later …
Objective: Estimation of Mitotic Index
from negative phase contrast images.
Pre-processing: Shading Correction to
reduce uneven illumination.
Nuclei Segmentation: isodata algorithm
[1] followed by dilation to segment the Learning Set: 20x20 px images originating
nuclei on Fluorescence Image. from detected nuclei.
Classification: Neural Network. For each sub-image 11 Texture Features were
calculated (1st Order Statistics)
[1] T.W. Ridler, S. Calvard, Picture thresholding using an iterative selection method, IEEE Trans. System, Man and
Cybernetics, SMC-8 (1978) 630-632. 12
13. Neural Networks and Markov Chains
Artificial Neural Network
11 input neurons
17 hidden layers (rule of thumb)
3 output neurons representing apoptopic cells,
mitotic cells and background.
Back-Propagation based learning.
Stop Condition: Learning Error 1e-7.
Fig: Artificial Neural Network with Back-Propagation
Learning scheme.
Image Source: Theodor Tanner Jr.
Post-processing Motivated by
Markov-Chain Transition Probability
Estimation.
aij estimated from the training set.
13
15. Summary
Program Features
• Off-line teaching module.
• Very fast classification.
• XML-based statistics generation.
• Automated plot generation.
• Run/Pause Button.
Performance: Click: http://goo.gl/mZpRU
Sensitivity: 82%
Specificity: 94%
Segmentation: Sensitivity: 85%
Criticism:
• Mitotic arrest is estimated. Detection of ALL cells
must be done to provide better estimation.
• Time-consuming teaching. Acknowledgments: Karol Radziszewski, Krzysztof Sikora, Marek
Skowroński, Krzysztof Stępień
15
Wroclaw University of Technology 2011, Poland.
16. Cervical Cancer Diagnosis
• Worldwide, cervical cancer is second
most common and the fifth deadliest
cancer in women.
• HPV vaccines are still being
investigated. Pap test is a long
examination (2-3 weeks).
• Phase Contrast allows for immediate
examination.
Fig: Typical image with epithelial cells.
Objective: automated Image Source: Dr Grzegorz Glab, Opole Hospital of Gyneacology.,
segmentation of epithelial cells and Poland.
detection of atypical cell nuclei.
16
17. Algorithm
1. 80 Texture Features were computed
for each of image subregion.
2. Selection of most relevant features.
3. Post-processing.
4. Active Contour in cell membrane
detection.
Fig: Typical image with epithelial cells.
Image Source: Dr Grzegorz Glab, Opole Hospital of Gyneacology.,
Poland.
17
18. How to Limit Number of Features?
Fig. Mean Classification Rate for
Metric B. FLD Scatter different number of features.
distance Classifier Matrices Sequential Forward Floating Selection
Classification 15.6% 16.9% 15.2% Scheme and 10-fold cross-validation
Error for 20 was used.
features
18
19. Classification
Objective: assign each subimage to one of the
classes: background, cell membrane, epithelial
cell.
1. k-Nearest Neighbor Clustering
2. Kernel Fisher Discriminant
3. Linear Fisher Discriminant
- a linear combination of features that best
separates two or more classes
Problem: How to estimate parameters?
Mean Classification Error (MCE) was estimated with cross-validation.
For k=20, MCE=15.2%, for h=5.5 MCE=12.9%. 19
20. Final Classification
Fig: Decision Matrix for kNN, FLD and KFD (best 86.8% classification rate).
Cell Membrane Validation
1. Ask biologist!
2. Active Contour Initialization.
3. Calculate Gradient Flow and
iteratively adapt the contour to the
membrane.
20
22. Nuclei Detection
Fig. Four-fold cross-validation on a training set with 57 pathological nuclei and 2379 other oval
objects. Ten classifiers have been tested. Specificity: 95%, sensitivity: 96%. (Marcin Smereka)
[6] Schilling T.*, Miroslaw L.*, Glab G., M. Smereka, Towards rapid cervical cancer diagnosis:
automated detection of cells in phase contrast images with texture features and active contours,
Int J Gynec Cancer 2007, 17(1):118-26. * First and Second Author contributed equally to the work.
22
23. Problems with HCS
Typical image based assays generate
thousands of hundreds images.
Image analysis is often unique and
composed of different algorithms. They
form sequential/parallel workflows or
their combination.
Common Approach: in-house created scripts that
call image processing modules lead to problems:
Algorithms have many parameters.
Estimation of the parameters is a big Portability: Cannot run on a different cluster without
rewriting all the scripts.
challenge.
Code reuse: Scripts are often very tied to a certain
purpose, so they are difficult to reuse.
Control and management of is highly
complex problem. Heavy maintenance: the more a script does its job well,
the more you’ll find yourself adding “generic” features
and maintaining requests from other users.
23
24. GC3Pie for HCS
by Grid Computing Competence Center
GC3Pie is a suite of Python classes (and command-line tools built upon them) to
aid in submitting and controlling batch jobs to clusters and grid resources
seamlessly
Building blocks by which a dynamic workflow can be quickly developed.
GC3Libs functionality: submit/monitor/kill a job,
retrieve output, etc.
Core operations: submit, update state, retrieve
(a snapshot of) output, cancel job.
Additional features:
• Get access to the Grid (e.g., authentication step)
• Prepare files for submission.
• Re-submit failed jobs.
• Monitor job status (loop)
• Retrieve results.
• Postprocess and display.
http://gc3pie.googlecode.com 24
25. Conclusions
Image Analysis can be complex, e.g. too many parameters (search space has too
many parameters) Artificial Intelligence may be helpful. Some experience is needed
to adapt AI to a given problem.
A few applications of AI were presented: Classifiers and statistical learning methods
(Non-linear and linear Classifiers), Search and optimization (Evolutionary
Algorithm), Probabilistic methods for uncertain reasoning (Markov Chain), neural
networks (NN with Back-Propagation Learning).
Management and control of Image Analysis in High Content Screening can be
simpler (- GC3Pie)
http://gc3pie.googlecode.com 25
26. Hard vs. Soft Selection
Hard selection: the best
individuals always win.
Pros: local mimima are
located easily.
Cons: crossing saddles
almost impossible.
Soft selection: probability of
selection depends on the
fitness.
Pros: better saddle crossing.
Cons: Parameter-dependent
method.
26
28. Evolutionary Algorithm
Individuals are the legal
solutions to our problem.
They form a population that
'evolves' in time and adapts
to the environment.
Fitness function is
measure for the adaptation.
Diversity is crucial. Finding
extrema and saddle points
are more frequent than by
gradient searches.
Operators that drive the
evolution:
Selection, Reproduction
Baldrige Group, group meeting
(Recombination), Mutation. 28
29. Cross-over
Recombination:
Mating process: two
parents create offspring.
The offspring consists of
the generic materials from
both parents.
Weaker offspring tend to
die out in time.
Goal: variations allows the
offspring to search out
different available niches,
find better fitness values
ergo better solutions.
29
30. Mutation
Mutation occurs in
nature. Although this
occurs very infrequently
many believe this is a
main driving force for
evolution. The result of
mutation can often result
in a weaker individual.
Occasionally the result
might be to produce a
stronger one.
30
32. Rolling-Ball Algorithm
The Rolling Ball Radius is the radius of curvature of the paraboloid. As a rule of
thumb, for 8-bit or RGB images it should be at least as large as the radius of the
largest object in the image that is not part of the background [2].
[2] Stanley Sternberg, “Biomedical Image Processing”, IEEE Computer, January 1983.
32