Towards Better than Human Capability in Diagnosing Prostate Cancer Using Infrared Spectroscopic Imaging

Towards Better than Human Capability in
Diagnosing Prostate Cancer
Using Infrared Spectroscopic Imaging

Xavier Llorà1, Rohith Reddy2,3, Brian Matesic2, Rohit Bhargava2,3

1 National Center for Supercomputing Applications & Illinois Genetic Algorithms Laboratory
2 Department of Bioengineering
3 Beckman Institute for Advanced Science and Technology
University of Illinois at Urbana-Champaign

Supported by AFOSR FA9550-06-1-0370, NSF at ISS-02-09199
DoD W81XWH-07-PRCP-NIA and the Faculty Fellows program at NCSA
GECCO 2007 HUMIES 1

Motivation
• The American Cancer Society estimated 234,460 new cases of
prostate cancer in 2006.
• Screening test:
– Digital rectal examination
– Prostate specific antigen (PSA) level
• Suspicious patients undergo biopsy process
• 1 million people undergo biopsies in the US alone per year
• Pathologist diagnose
– Crucial for the therapy
– Human accuracy ( error < 5% )
– Costs

GECCO 2007 Llorà, Reddy, Matesic & Bhargava 2

Current Diagnosis Procedure
• Biopsy-staining-microscopy-manual recognition is the diagnosis
procedure for the last 150 years.


Advances on Fourier Transform IR Imaging
• Infrared spectroscopy is a classical technique for
measuring chemical composition of specimens.
• At specific frequencies, the vibrational modes of
molecules are resonant with the frequency of infrared
light.
• Microscope has develop to the point that resolution
that match a pixel with a cell (and keep improving).
• It allows to start from the same data (stained tissue)
• Generates larges volumes of data


Advances on Fourier Transform IR Imaging


Spectrum Analysis
• Microscope generate a lot of data
• Per spot the spectra signature requires GBs of storage
• Bhargava et al. (2005) feature extraction for tissue identification

• More than 200 potential features per spectrum (cell/pixel)
• Firsts methodology that allowed tissue identification


Human Activity
• As mentioned earlier: Area of exclusive human activity
• Two key tasks:
– Using the spectra identify tissue type
– Using filtered tissue diagnose samples
• Both tasks:
– Require learning
– Can be model as supervised learning problems
• Challenges:
– Very large volumes of information
– Scalability and efficiency is a priority
– Interpretability of the models


Genetics-Based Machine Learning
• GA-driven learning mechanisms
• Mainly rule based models
• Pittsburgh approach
• Inherently parallel process
• GBML is a good candidate for very large problems
• Rule matching is know to be the governing factor on
the execution time (Llorà & Sastry, 2006)


Current Off-the-Shelf Systems
• There is a wide variety of GBML/LCS implementations
• Most of them:
– Oriented to run experiments in a single processors
– Have large memory footprints
– Typical problem = tens of attributes + thousand
records
– Few attention to efficient implementation and
acceleration techniques (Llorà & Sastry, 2006)
• Cancer diagnosis overwhelms them:
– Hundreds of features
– Millions of records

NAX Specs
• Affordable memory footprints
• Squeeze any computation you got
• Efficient implementations:
– Hardware acceleration
– Massive parallelism


NAX Mechanics
• The basic procedure:
1. Create an empty decision list
2. GA evolves a maximally accurate and maximally
general rule using the available instances
3. Add the evolved rule to the decision list
4. Remove all the instances covered by the rule
5. If there are uncovered instances go to step 2


A Little Story about Hardware
• SIMD (Single Instruction Multiple Data) architectures were hot
in the ‘80s supercomputing scene
• SIMD were widely used to performed binary operations among
two vector operands (Cray)
• Those processors were very expensive
• Consumer products took another path, the scalar one
– No SIMD support in hardware (left to the software)
– The massive with spread of needs for CPUs make them cheaper
and cheaper
• Side effect:
– Hot in the supercomputing scene in the ‘90s become building
machines with large numbers of “cheap” processors


The Consumer Market Strikes Back
• Computer games and multimedia applications
– Use a particular type of matrix operations
– Graphics heavily use 4x4 matrix operations
– Digital signal processing applications also take advantage of it
• In late ‘90s Intel introduced SIMD instructions on Pentium chips via
MMX
– Multimedia oriented instructions
– Vector operations for fix-size blocks
– Goal: accelerate via hardware multimedia apps
• Nowadays most vendors provide “multimedia” vector instruction
sets
– Intel: MMX, SSE, SSE2, SSE3
– AMD: 3Dnow!, 3Dnow+! (also support Intel’s MMX, SSE, SSE2)
– IBM/Motorola: AltiVec


A Simple Example (I/II)
• Match = a simple aligned ‘and’ and ‘equal’

Instance Instance
10 01 10 01 01 10 10 01
0101 1001
Condition Condition
& &
10 01 11 11 10 01 11 11
01## 01##

Temp Temp
10 01 10 01 00 00 10 01

== ==
Instance Instance
10 01 10 01 01 10 10 01

Matched Not Matched
11 11 11 11 10 01 11 11

• Vector operations allow different manipulations
• 4 floats can be manipulated at once (spectra features)


A Simple Example (II/II)
1
2
OP1
3
4

1
2
OP2
3
4

1
4
Res
9
16

vecOP1 vecOP2
1 2 3 4 1 2 3 4

vecRes 1 4 9 16


Exploiting the Inherent Parallelism
• Rule matching rules the overall execution time
• Fitness calculation > 99%
• The parallelization method focused on reducing
communication cost
• The idea
– Most of the time evaluating
– Evaluate the evaluation
– No master/slave
– All processors run the same GA seeded in the same manner
– Each processor only evaluate a chunk of the population (N/p)
– Broadcast the fitness of the chunk to the other processors


NAX: Stretching GBML


Prostate Cancer Data
1. Tissue identification
– Modeled as a supervised learning problem
– (Features, tissue type)
– The goal: Accurately retrieve epithelial tissue
2. Tissue identification
– Modeled as a supervised learning problem
– (Features, diagnosis)
– The goal: Accurately diagnose each cell (pixel) and
aggregate those diagnosis to generate a spot
(patient) diagnosis


GBML Identifies Tissue Types Accurately

Original

GECCO 2007 HUMIES Llorà, Reddy, Matesic & Bhargava 19

GBML Identifies Tissue Types Accurately
OK

Misclassified

• Accuracy >96%
• Mistakes on minority classes (not targeted) and boundaries

Filtered Tissue is Accurately Diagnosed

Original



Diagnosed


• Pixel crossvalidation accuracy (87.34%)
• Spot accuracy
– 68 of 69 malignant spots
– 70 of 71 benign spots

• Human-competitive computer-aided diagnosis system
is possible
• First published results that fall in the range of
human error (<5%)


Breakthrough
• Current best published result, examples from
different fields
– Image Analysis - 77% accuracy1 (cancer/no cancer)
– Raman Spectroscopy – 86%2 accuracy
– Genomic analysis – 76% (low grade/high grade cancer)

1. R. Stotzka et al. Anal. Quant. Cytol. Histol.,17, 204-218 (1995).
2. P. Crow et al. Urol. 65, 1126-1130 (2005)
3. L. True et al. Proc Natl Acad Sci U S A. 2006 Jul 18;103(29):10991-10996.


Conclusions
• Humans are the ultimate and only source of diagnosis
• The FTIR imaging provides information about chemical
signatures and structure
• Large volumes of data forced efficient GBML design
• Diagnosis require two steps
• The results on prostate cancer are human competitive
• No previous method has been able to match
pathologist accuracy


Towards Better than Human Capability in
Diagnosing Prostate Cancer
Using Infrared Spectroscopic Imaging

Xavier Llorà1, Rohith Reddy2,3, Brian Matesic2, Rohit Bhargava2,3

1 National Center for Supercomputing Applications & Illinois Genetic Algorithms Laboratory
2 Department of Bioengineering
3 Beckman Institute for Advanced Science and Technology
University of Illinois at Urbana-Champaign

Supported by AFOSR FA9550-06-1-0370, NSF at ISS-02-09199
DoD W81XWH-07-PRCP-NIA and the Faculty Fellows program at NCSA
GECCO 2007 HUMIES 26

Towards Better than Human Capability in Diagnosing Prostate Cancer Using Infrared Spectroscopic Imaging

Recommended

Recommended

More Related Content

Similar to Towards Better than Human Capability in Diagnosing Prostate Cancer Using Infrared Spectroscopic Imaging

Similar to Towards Better than Human Capability in Diagnosing Prostate Cancer Using Infrared Spectroscopic Imaging (20)

More from Xavier Llorà

More from Xavier Llorà (20)

Recently uploaded

Recently uploaded (20)

Towards Better than Human Capability in Diagnosing Prostate Cancer Using Infrared Spectroscopic Imaging