Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Towards Better than Human Capability in
         Diagnosing Prostate Cancer
    Using Infrared Spectroscopic Imaging

    ...
Motivation
• The American Cancer Society estimated 234,460 new cases of
  prostate cancer in 2006.
• Screening test:
     ...
Current Diagnosis Procedure
• Biopsy-staining-microscopy-manual recognition is the diagnosis
  procedure for the last 150 ...
Advances on Fourier Transform IR Imaging
• Infrared spectroscopy is a classical technique for
   measuring chemical compos...
Advances on Fourier Transform IR Imaging




GECCO 2007     Llorà, Reddy, Matesic & Bhargava   5
Spectrum Analysis
• Microscope generate a lot of data
• Per spot the spectra signature requires GBs of storage
• Bhargava ...
Human Activity
• As mentioned earlier: Area of exclusive human activity
• Two key tasks:
     – Using the spectra identify...
Genetics-Based Machine Learning
• GA-driven learning mechanisms
• Mainly rule based models
• Pittsburgh approach
• Inheren...
Current Off-the-Shelf Systems
• There is a wide variety of GBML/LCS implementations
• Most of them:
      – Oriented to ru...
NAX Specs
• Affordable memory footprints
• Squeeze any computation you got
• Efficient implementations:
      – Hardware a...
NAX Mechanics
• The basic procedure:
      1. Create an empty decision list
      2. GA evolves a maximally accurate and m...
A Little Story about Hardware
• SIMD (Single Instruction Multiple Data) architectures were hot
  in the ‘80s supercomputin...
The Consumer Market Strikes Back
• Computer games and multimedia applications
      – Use a particular type of matrix oper...
A Simple Example (I/II)
• Match = a simple aligned ‘and’ and ‘equal’

                           Instance                 ...
A Simple Example (II/II)
                                                    1
                                           ...
Exploiting the Inherent Parallelism
• Rule matching rules the overall execution time
• Fitness calculation > 99%
• The par...
NAX: Stretching GBML




GECCO 2007        Llorà, Reddy, Matesic & Bhargava   17
Prostate Cancer Data
1. Tissue identification
      –      Modeled as a supervised learning problem
      –      (Features...
GBML Identifies Tissue Types Accurately

                    Original




GECCO 2007 HUMIES              Llorà, Reddy, Mat...
GBML Identifies Tissue Types Accurately
OK




                                            Misclassified


• Accuracy >96%...
Filtered Tissue is Accurately Diagnosed


                    Original




GECCO 2007 HUMIES              Llorà, Reddy, Ma...
Filtered Tissue is Accurately Diagnosed


                    Diagnosed




GECCO 2007 HUMIES               Llorà, Reddy, ...
Filtered Tissue is Accurately Diagnosed
• Pixel crossvalidation accuracy (87.34%)
• Spot accuracy
      – 68 of 69 maligna...
Breakthrough
• Current best published result, examples from
    different fields
      – Image Analysis - 77% accuracy1 (c...
Conclusions
• Humans are the ultimate and only source of diagnosis
• The FTIR imaging provides information about chemical
...
Towards Better than Human Capability in
         Diagnosing Prostate Cancer
    Using Infrared Spectroscopic Imaging

    ...
Upcoming SlideShare
Loading in …5
×

Towards Better than Human Capability in Diagnosing Prostate Cancer Using Infrared Spectroscopic Imaging

1,227 views

Published on

Cancer diagnosis is essentially a human task. Almost universally, the process requires the extraction of tissue (biopsy) and examination of its microstructure by a human. To improve diagnoses based on limited and inconsistent morphologic knowledge, a new approach has recently been proposed that uses molecular spectroscopic imaging to utilize microscopic chemical composition for diagnoses. In contrast to visible imaging, the approach results in very large data sets as each pixel contains the entire molecular vibrational spectroscopy data from all chemical species. Here, we propose data handling and analysis strategies to allow computer-based diagnosis of human prostate cancer by applying a novel genetics-based machine learning technique ({\tt NAX}). We apply this technique to demonstrate both fast learning and accurate classification that, additionally, scales well with parallelization. Preliminary results demonstrate that this approach can improve current clinical practice in diagnosing prostate cancer.

Published in: Technology, Health & Medicine
  • Be the first to comment

  • Be the first to like this

Towards Better than Human Capability in Diagnosing Prostate Cancer Using Infrared Spectroscopic Imaging

  1. 1. Towards Better than Human Capability in Diagnosing Prostate Cancer Using Infrared Spectroscopic Imaging Xavier Llorà1, Rohith Reddy2,3, Brian Matesic2, Rohit Bhargava2,3 1 National Center for Supercomputing Applications & Illinois Genetic Algorithms Laboratory 2 Department of Bioengineering 3 Beckman Institute for Advanced Science and Technology University of Illinois at Urbana-Champaign Supported by AFOSR FA9550-06-1-0370, NSF at ISS-02-09199 DoD W81XWH-07-PRCP-NIA and the Faculty Fellows program at NCSA GECCO 2007 HUMIES 1
  2. 2. Motivation • The American Cancer Society estimated 234,460 new cases of prostate cancer in 2006. • Screening test: – Digital rectal examination – Prostate specific antigen (PSA) level • Suspicious patients undergo biopsy process • 1 million people undergo biopsies in the US alone per year • Pathologist diagnose – Crucial for the therapy – Human accuracy ( error < 5% ) – Costs GECCO 2007 Llorà, Reddy, Matesic & Bhargava 2
  3. 3. Current Diagnosis Procedure • Biopsy-staining-microscopy-manual recognition is the diagnosis procedure for the last 150 years. GECCO 2007 Llorà, Reddy, Matesic & Bhargava 3
  4. 4. Advances on Fourier Transform IR Imaging • Infrared spectroscopy is a classical technique for measuring chemical composition of specimens. • At specific frequencies, the vibrational modes of molecules are resonant with the frequency of infrared light. • Microscope has develop to the point that resolution that match a pixel with a cell (and keep improving). • It allows to start from the same data (stained tissue) • Generates larges volumes of data GECCO 2007 Llorà, Reddy, Matesic & Bhargava 4
  5. 5. Advances on Fourier Transform IR Imaging GECCO 2007 Llorà, Reddy, Matesic & Bhargava 5
  6. 6. Spectrum Analysis • Microscope generate a lot of data • Per spot the spectra signature requires GBs of storage • Bhargava et al. (2005) feature extraction for tissue identification • More than 200 potential features per spectrum (cell/pixel) • Firsts methodology that allowed tissue identification GECCO 2007 Llorà, Reddy, Matesic & Bhargava 6
  7. 7. Human Activity • As mentioned earlier: Area of exclusive human activity • Two key tasks: – Using the spectra identify tissue type – Using filtered tissue diagnose samples • Both tasks: – Require learning – Can be model as supervised learning problems • Challenges: – Very large volumes of information – Scalability and efficiency is a priority – Interpretability of the models GECCO 2007 Llorà, Reddy, Matesic & Bhargava 7
  8. 8. Genetics-Based Machine Learning • GA-driven learning mechanisms • Mainly rule based models • Pittsburgh approach • Inherently parallel process • GBML is a good candidate for very large problems • Rule matching is know to be the governing factor on the execution time (Llorà & Sastry, 2006) GECCO 2007 Llorà, Reddy, Matesic & Bhargava 8
  9. 9. Current Off-the-Shelf Systems • There is a wide variety of GBML/LCS implementations • Most of them: – Oriented to run experiments in a single processors – Have large memory footprints – Typical problem = tens of attributes + thousand records – Few attention to efficient implementation and acceleration techniques (Llorà & Sastry, 2006) • Cancer diagnosis overwhelms them: – Hundreds of features – Millions of records GECCO 2007 Llorà, Reddy, Matesic & Bhargava 9
  10. 10. NAX Specs • Affordable memory footprints • Squeeze any computation you got • Efficient implementations: – Hardware acceleration – Massive parallelism GECCO 2007 Llorà, Reddy, Matesic & Bhargava 10
  11. 11. NAX Mechanics • The basic procedure: 1. Create an empty decision list 2. GA evolves a maximally accurate and maximally general rule using the available instances 3. Add the evolved rule to the decision list 4. Remove all the instances covered by the rule 5. If there are uncovered instances go to step 2 GECCO 2007 Llorà, Reddy, Matesic & Bhargava 11
  12. 12. A Little Story about Hardware • SIMD (Single Instruction Multiple Data) architectures were hot in the ‘80s supercomputing scene • SIMD were widely used to performed binary operations among two vector operands (Cray) • Those processors were very expensive • Consumer products took another path, the scalar one – No SIMD support in hardware (left to the software) – The massive with spread of needs for CPUs make them cheaper and cheaper • Side effect: – Hot in the supercomputing scene in the ‘90s become building machines with large numbers of “cheap” processors GECCO 2007 Llorà, Reddy, Matesic & Bhargava 12
  13. 13. The Consumer Market Strikes Back • Computer games and multimedia applications – Use a particular type of matrix operations – Graphics heavily use 4x4 matrix operations – Digital signal processing applications also take advantage of it • In late ‘90s Intel introduced SIMD instructions on Pentium chips via MMX – Multimedia oriented instructions – Vector operations for fix-size blocks – Goal: accelerate via hardware multimedia apps • Nowadays most vendors provide “multimedia” vector instruction sets – Intel: MMX, SSE, SSE2, SSE3 – AMD: 3Dnow!, 3Dnow+! (also support Intel’s MMX, SSE, SSE2) – IBM/Motorola: AltiVec GECCO 2007 Llorà, Reddy, Matesic & Bhargava 13
  14. 14. A Simple Example (I/II) • Match = a simple aligned ‘and’ and ‘equal’ Instance Instance 10 01 10 01 01 10 10 01 0101 1001 Condition Condition & & 10 01 11 11 10 01 11 11 01## 01## Temp Temp 10 01 10 01 00 00 10 01 == == Instance Instance 10 01 10 01 01 10 10 01 Matched Not Matched 11 11 11 11 10 01 11 11 • Vector operations allow different manipulations • 4 floats can be manipulated at once (spectra features) GECCO 2007 Llorà, Reddy, Matesic & Bhargava 14
  15. 15. A Simple Example (II/II) 1 2 OP1 3 4 1 2 OP2 3 4 1 4 Res 9 16 vecOP1 vecOP2 1 2 3 4 1 2 3 4 vecRes 1 4 9 16 GECCO 2007 Llorà, Reddy, Matesic & Bhargava 15
  16. 16. Exploiting the Inherent Parallelism • Rule matching rules the overall execution time • Fitness calculation > 99% • The parallelization method focused on reducing communication cost • The idea – Most of the time evaluating – Evaluate the evaluation – No master/slave – All processors run the same GA seeded in the same manner – Each processor only evaluate a chunk of the population (N/p) – Broadcast the fitness of the chunk to the other processors GECCO 2007 Llorà, Reddy, Matesic & Bhargava 16
  17. 17. NAX: Stretching GBML GECCO 2007 Llorà, Reddy, Matesic & Bhargava 17
  18. 18. Prostate Cancer Data 1. Tissue identification – Modeled as a supervised learning problem – (Features, tissue type) – The goal: Accurately retrieve epithelial tissue 2. Tissue identification – Modeled as a supervised learning problem – (Features, diagnosis) – The goal: Accurately diagnose each cell (pixel) and aggregate those diagnosis to generate a spot (patient) diagnosis GECCO 2007 Llorà, Reddy, Matesic & Bhargava 18
  19. 19. GBML Identifies Tissue Types Accurately Original GECCO 2007 HUMIES Llorà, Reddy, Matesic & Bhargava 19
  20. 20. GBML Identifies Tissue Types Accurately OK Misclassified • Accuracy >96% • Mistakes on minority classes (not targeted) and boundaries GECCO 2007 HUMIES Llorà, Reddy, Matesic & Bhargava 20
  21. 21. Filtered Tissue is Accurately Diagnosed Original GECCO 2007 HUMIES Llorà, Reddy, Matesic & Bhargava 21
  22. 22. Filtered Tissue is Accurately Diagnosed Diagnosed GECCO 2007 HUMIES Llorà, Reddy, Matesic & Bhargava 22
  23. 23. Filtered Tissue is Accurately Diagnosed • Pixel crossvalidation accuracy (87.34%) • Spot accuracy – 68 of 69 malignant spots – 70 of 71 benign spots • Human-competitive computer-aided diagnosis system is possible • First published results that fall in the range of human error (<5%) GECCO 2007 HUMIES Llorà, Reddy, Matesic & Bhargava 23
  24. 24. Breakthrough • Current best published result, examples from different fields – Image Analysis - 77% accuracy1 (cancer/no cancer) – Raman Spectroscopy – 86%2 accuracy – Genomic analysis – 76% (low grade/high grade cancer) 1. R. Stotzka et al. Anal. Quant. Cytol. Histol.,17, 204-218 (1995). 2. P. Crow et al. Urol. 65, 1126-1130 (2005) 3. L. True et al. Proc Natl Acad Sci U S A. 2006 Jul 18;103(29):10991-10996. GECCO 2007 HUMIES Llorà, Reddy, Matesic & Bhargava 24
  25. 25. Conclusions • Humans are the ultimate and only source of diagnosis • The FTIR imaging provides information about chemical signatures and structure • Large volumes of data forced efficient GBML design • Diagnosis require two steps • The results on prostate cancer are human competitive • No previous method has been able to match pathologist accuracy GECCO 2007 Llorà, Reddy, Matesic & Bhargava 25
  26. 26. Towards Better than Human Capability in Diagnosing Prostate Cancer Using Infrared Spectroscopic Imaging Xavier Llorà1, Rohith Reddy2,3, Brian Matesic2, Rohit Bhargava2,3 1 National Center for Supercomputing Applications & Illinois Genetic Algorithms Laboratory 2 Department of Bioengineering 3 Beckman Institute for Advanced Science and Technology University of Illinois at Urbana-Champaign Supported by AFOSR FA9550-06-1-0370, NSF at ISS-02-09199 DoD W81XWH-07-PRCP-NIA and the Faculty Fellows program at NCSA GECCO 2007 HUMIES 26

×