Computer-aided Detection of Pulmonary 
Nodules using Genetic Programming 
Wook-Jin Choi and Tae-Sun Choi
Contents 
• Introduction 
• Lung Segmentation based on 3D Approach 
• Nodule Candidates Detection and Feature 
Extraction 
• Genetic Programming Based Classification 
• Experimental Results 
• Conclusions 
• References 
2
Introduction 
• Pulmonary nodule detection is attractive applications of computer-aided 
detection (CAD) because lung cancer is the leading cause of cancer 
deaths. 
• If lung cancer detected in early phase, the 3-year survival rate is more 
than 80%. 
• Recently, researchers have developed a number of CAD methods for lung 
nodules to aid radiologists in identifying nodule candidates from CT 
images. 
• Current CT technology allows for near isotropic, submillimeter resolution 
acquisition of the complete chest in a single breath hold. 
• These thin-slice chest scans have become indispensable in thoracic 
radiology, but have also substantially increased the data load for 
radiologists. 
• Automating the analysis of such data is, therefore, a necessity and this 
has created a rapidly developing research area in medical imaging. 
3
Related Works 
• Template matching methods 
– Genetic Algorithm Template Matching [10] 
– 3D Template Matching [11] 
• Model based methods 
– Patient-specific models [5] 
– Surface normal overlap model [7] 
• Machine learning techniques 
– Neural network [6] 
– Fuzzy c-means clustering [9] 
• Digital filtering 
– Quantized convergence index filter [8] 
– Iris filter [13] 
• Statistical analysis [12] 
4
Proposed Algorithm 
Flow chart of Pulmonary nodule detection 
5
Lung Segmentation based on 3D Approach 
6
Lung Segmentation based on 3D Approach 
• Select adaptive threshold value at every slice in the CT image 
sequence using diagonal intensity histogram [4]. 
• The CT images are divided into background area(body) and 
foreground area(air or lung) as shown below. 
7 
Original CT image and converted CT image with threshold
Lung Segmentation based on 3D Approach 
8 
• Segment lung region and remove the rim (outer part of the 
body). 
• Correct the contour of the lung volume (correct excluded wall 
side nodule). 
Extracted lung region using 3D connected component labeling and 
contour corrected lung region (containing wall side nodule)
Nodule Candidates Detection and Feature Extraction 
9
ROI Extraction 
10 
6-stepped ROI and extracted 
nodule candidates 
• Adaptive multiple 
thresholding method. 
– the traditional multiple 
thresholding method makes 
many steps of grey levels. 
– We calculate the adaptive 
threshold value using diagonal 
histogram at every slice of 
lung volume. 
– This value is base threshold 
value for multiple 
thresholding. 
– We make additional five 
threshold values which are 
base threshold + 50, -50, -100, 
-150 and -200.
Nodule Candidates Detection 
• We can remove the vessels and 
noise in the lung volume using 
rule based classifier. 
• Vessel Removing 
The vessel is classified by volume 
elongation factor and compactness. 
– volume is extremely bigger than nodule 
– longer than nodule 
– not compact object. 
• Noise removing 
– radius of ROI is smaller than 3mm or 
bigger than 30mm. 
• Remaining ROIs are nodule candidates 
11 
6-stepped ROI and extracted 
nodule candidates
Feature Extraction 
• 3D geometric features 
– Volume 
– elongation factor 
– Compactness 
– approximated radius. 
• 2D pixel based features. 
– Use median slice of nodule candidates (area of the median slice is the largest) 
– To extract 2D texture feature, we normalize the image size of nodule 
candidates. 
– 3 types of nodule sizes and then extract the features. 
• < 5mm : the size of image matrix is 8x8. 
• 5mm ~10mm : the size of image matrix is 16x16. 
• > 10 mm : the size of image matrix is 32x32 
– extract 14 features from the image matrix. 
• mean, variance, skewness, kurtosis, area, radius and 8 biggest eigenvalues. 
12
Feature Extraction 
13 
Index Feature 
1 Z position 
2 Mean 
3 Variance 
4 Skewness 
5 Kurtosis 
6 Area 
7 Radius 
8 Perimeter 
9 Compactness 
10~17 Largest Eigenvalue 1~8 
18 X centroid 
19 Y centroid 
20 Z centroid 
21 Width 
22 Height 
23 Depth 
24 Size
Genetic Programming Based Classification 
14
Genetic Programming Based Classification 
• Genetic Programming (GP) 
– an evolutionary optimization technique [14]. 
• The basic structure of GP is very similar to 
Genetic Algorithm(GA). 
• The chromosome 
– GA : variable (binary digit) 
– GP : program (tree or graph) 
15
Genetic Programming Based Classification 
16 
A function represented as a tree structure
Genetic Programming Based Classification 
• Our goal of GP evolution is to reduce false positive 
(FP) and increase true positive (TP). 
• In the proposed scheme, an optimized classifier is 
carried out using combination of features and random 
constant values. 
• GP optimally selects adequate features from all 
extracted features and combines the selected features 
with mathematical operators. 
• The GP generates individual classifiers and those are 
evaluated by fitness function. 
• The result of GP can convert complex input features to 
simple value. 
17
Genetic Programming Based Classification 
• GP chromosome 
– The terminal set - The elements of feature vector extracted from 
nodule candidate images and randomly generated constants with in 
the range 0,1. 
– The function set - Four standard arithmetic operator namely plus, 
minus, multiply and division and additional mathematical operators 
log, exp, abs, sin and cos.(All operators in the function set are 
protected to avoid exception) 
• GP evolves combination of the terminal set and 
function set. 
18
Genetic Programming Based Classification 
• Fitness Function – evaluate every individuals in GP 
generation 
– True positive rate (TPR) 
– Specificity (SPC) 
• SPC is the value subtracted from 1 to FPR and also called true negative rate(TNR). 
TN FP 
SPC FPR 
     
– Area under the ROC curve (Az) 
• ROC curve is plotted between TP and FP for different threshold values. 
• Az is area under the ROC curve and a good measure of classifier performance in 
different condition. 
– Fitness Function 
19 
TP 
TPR 
TP FN 
 
 
1 1 
TN FP FP TN 
  
f  TPR*SPC* Az
Genetic Programming Based Classification 
Objective To evolve maximum fitness 
Selection Generational 
Population Size 300 
Generation Size 80 
Initial Tree Depth Limit 6 
Initial population Ramped half and half 
GP Operators prob Variable ratio of crossover mutation is used 
Sampling Tournament 
Survival mechanism Keep the best individuals 
Real max. tree level 30 
Genetic Programming parameter 
20
Genetic Programming Based Classification 
• Examples of GP 
– minus(minus(P(21,:),exp(P(23,:))),minus(mypower(mylog(plus(times(P(14 
,:),minus(P(23,:),mypower(mylog(plus(times(P(12,:),minus(P(11,:),mypow 
er(P(13,:),P(13,:)))),P(22,:))),P(13,:)))),minus(P(20,:),cos(exp(P(7,:)))))),mypo 
wer(exp(P(7,:)),P(7,:))),times(minus(minus(mypower(exp(P(23,:)),P(7,:)),P( 
11,:)),times(exp(P(23,:)),P(12,:))),P(11,:)))) 
– minus(minus(plus(P(4,:),P(7,:)),sin(minus(P(7,:),mypower(P(24,:),plus(min 
us(P(7,:),P(11,:)),mypower(P(15,:),P(7,:))))))),mypower(mypower(mypower 
(P(24,:),exp(P(11,:))),minus(plus(P(10,:),mypower(plus(minus(P(13,:),sin(e 
xp(P(12,:)))),mypower(P(24,:),plus(P(13,:),P(4,:)))),plus(plus(minus(0.35089 
,0.35089),P(3,:)),P(7,:)))),P(11,:))),minus(plus(P(10,:),minus(P(4,:),plus(P(10,: 
),P(7,:)))),mypower(minus(plus(P(4,:),exp(P(2,:))),minus(P(9,:),P(4,:))),P(11,: 
))))) 
21
Experimental Results 
• Lung Image Database Consortium (LIDC) database [15] 
– to evaluate the performance of the proposed method. 
– The LIDC is developing a publicly available database of thoracic computed 
tomography (CT) scans as a medical imaging research resource to promote 
the development of computer-aided detection or characterization of 
pulmonary Nodules. 
– The database is separated into 84 cases, each containing around 100-400 
Digital Imaging and Communication (DICOM) images and an XML data file 
containing the physician annotations 
• We applied our method to 32 scans consisting of 153 nodules 
and 7528 slices. The pixel size in the database ranged from 0.65 
to 0.75 mm and the reconstruction interval ranged. 
• The half of dataset(16 scans) is used for training and another half 
of dataset(another 16 scans) is used for testing the classifier. 
22
Experimental Results 
(a) (b) 
The result of pulmonary nodule detection: (a) 43rd slice, (b) volume 
rendering 
23
Experimental Results 
Data set TPR FPR Az 
learn 93.33% 0.127 0.934 
test 91.67% 0.138 0.897 
all 92.31% 0.133 0.912 
24 
The results of GP based classifier
Experimental Results 
ROC curves of GP based classifier with respect to three datasets 
25
Conclusion 
• We have proposed a novel pulmonary nodule detection algorithm in CT 
images. 
• Lung region is segmented using adaptive thresholding and voxel labeling 
based method. 
• Then nodule candidates are detected using adaptive multiple 
thresholding and rule based classifier with 3D geometric features. 
• Next, 3D and 2D features are extracted from the detected nodule 
candidates. 
• Finally, the extracted features are optimized and then classified into 
nodule and non-nodule using GP. 
• We applied proposed algorithm to the LIDC database of NCI. 
• This method extremely reduced FP rate. 
• The FPs per scan is only 6.5 with more than 90% sensitivity. 
• The results show the superiority of the proposed method. 
26
References 
• [1] Ahmedin Jemal, Rebecca Siegel, ElizabethWard, Yongping Hao, Jiaquan Xu, and Michael J 
Thun, “Cancerstatistics, 2009,” CA Cancer J Clin, vol. 59, no. 4, pp. 225–49, Jan 2009. 
• [2] K-W Jung, Y-J Won, S Park, H-J Kong, J Sung, H-R Shin, E-Cl Park, and J S Lee, “Cancer 
statistics in korea: incidence, mortality and survival in 2005,” J Korean Med Sci, vol. 24, no. 6, 
pp. 995–1003, Dec 2009. 
• [3] Qiang Li, “Recent progress in computer-aided diagnosis of lung nodules on thin-section 
ct.,” Comput Med Imaging Graph, vol. 31, no. 4-5, pp. 248–257, 2007. 
• [4] S G Armato, M L Giger, C J Moran, J T Blackburn, K Doi, and H MacMahon, “Computerized 
detection of pulmonary nodules on ct scans,” Radiographics, vol. 19, no. 5, pp. 1303–11, Jan 
1999. 
• [5] M Brown, M McNitt-Gray, J Goldin, R Suh, J Sayre, and D Aberle, “Patient-specific models 
for lung nodule detection and surveillance in ct images,” IEEE TMI, vol. 20, no. 12, pp. 1242 – 
1250, Dec 2001. 
• [6] K Suzuki, SG Armato III, F Li, S Sone, and K Doi, “Massive training artificial neural network 
(mtann) for reduction of false positives in computerized detection of lung nodules in low-dose 
computed tomography,” Medical physics, vol. 30, pp. 1602, 2003. 
• [7] D Paik, C Beaulieu, G Rubin, B Acar, R Jeffrey, J Yee, J Dey, and S Napel, “Surface normal 
overlap: a computer-aided detection algorithm with application to colonic polyps and lung 
nodules in helical ct,” IEEE TMI, vol. 23, no. 6, pp. 661 – 675, Jun 2004. 
• [8] Sumiaki Matsumoto, Harold L Kundel, James C Gee, Warren B Gefter, and Hiroto Hatabu, 
“Pulmonary nodule detection in ct images with quantized convergence index filter.,” Med 
Image Anal, vol. 10, no. 3, pp. 343–352, Jun 2006. 
27
References 
• [9] N Memarian, J Alirezaie, and P Babyn, “Computerized detection of lung nodules with an 
enhanced false positive reduction scheme,” IEEE ICIP 2006, pp. 1921 –1924, Sep 2006. 
• [10] Jamshid Dehmeshki, Xujiong Ye, Xinyu Lin, Manlio Valdivieso, and Hamdan Amin, 
“Automated detection of lung nodules in ct images using shape-based genetic algorithm.,” 
Comput Med Imaging Graph, vol. 31, no. 6, pp. 408–417, Sep 2007. 
• [11] Onur Osman, Serhat Ozekes, and Osman N Ucan, “Lung nodule diagnosis using 3d 
template matching.,” Comput Biol Med, vol. 37, no. 8, pp. 1167–1172, Aug 2007. 
• [12] A El-Baz, G Gimel’farb, R Falk, and M Abo El-Ghar, “Automatic analysis of 3d low dose ct 
images for early diagnosis of lung cancer,” Pattern Recognition, vol. 42, no. 6, pp. 1041–1051, 
Jan 2009. 
• [13] JJ Su´arez-Cuenca, PG Tahoces, M Souto, MJ Lado, M Remy-Jardin, J Remy, and J Jos´e 
Vidal, “Application of the iris filter for automatic detection of pulmonary nodules on 
computed tomography images,” Computers in Biology and Medicine, 2009. 
• [14] J Koza, “Genetic programming: On the programming of computers by means of natural 
selection,” The MIT Press, Jan 1992. 
• [15] S G Armato, G McLennan, M F McNitt-Gray, C R Meyer, D Yankelevitz, D R Aberle, C I 
Henschke, E A Hoffman, E A Kazerooni, H MacMahon, A P Reeves, B Y Croft, L P Clarke, and 
Lung Image Database Consortium Research Group, “Lung image database consortium: 
developing a resource for the medical imaging research community.,” Radiology, vol. 232, no. 
3, pp. 739–748, Sep 2004. 
28
Thank You 
29

Computer aided detection of pulmonary nodules using genetic programming

  • 1.
    Computer-aided Detection ofPulmonary Nodules using Genetic Programming Wook-Jin Choi and Tae-Sun Choi
  • 2.
    Contents • Introduction • Lung Segmentation based on 3D Approach • Nodule Candidates Detection and Feature Extraction • Genetic Programming Based Classification • Experimental Results • Conclusions • References 2
  • 3.
    Introduction • Pulmonarynodule detection is attractive applications of computer-aided detection (CAD) because lung cancer is the leading cause of cancer deaths. • If lung cancer detected in early phase, the 3-year survival rate is more than 80%. • Recently, researchers have developed a number of CAD methods for lung nodules to aid radiologists in identifying nodule candidates from CT images. • Current CT technology allows for near isotropic, submillimeter resolution acquisition of the complete chest in a single breath hold. • These thin-slice chest scans have become indispensable in thoracic radiology, but have also substantially increased the data load for radiologists. • Automating the analysis of such data is, therefore, a necessity and this has created a rapidly developing research area in medical imaging. 3
  • 4.
    Related Works •Template matching methods – Genetic Algorithm Template Matching [10] – 3D Template Matching [11] • Model based methods – Patient-specific models [5] – Surface normal overlap model [7] • Machine learning techniques – Neural network [6] – Fuzzy c-means clustering [9] • Digital filtering – Quantized convergence index filter [8] – Iris filter [13] • Statistical analysis [12] 4
  • 5.
    Proposed Algorithm Flowchart of Pulmonary nodule detection 5
  • 6.
    Lung Segmentation basedon 3D Approach 6
  • 7.
    Lung Segmentation basedon 3D Approach • Select adaptive threshold value at every slice in the CT image sequence using diagonal intensity histogram [4]. • The CT images are divided into background area(body) and foreground area(air or lung) as shown below. 7 Original CT image and converted CT image with threshold
  • 8.
    Lung Segmentation basedon 3D Approach 8 • Segment lung region and remove the rim (outer part of the body). • Correct the contour of the lung volume (correct excluded wall side nodule). Extracted lung region using 3D connected component labeling and contour corrected lung region (containing wall side nodule)
  • 9.
    Nodule Candidates Detectionand Feature Extraction 9
  • 10.
    ROI Extraction 10 6-stepped ROI and extracted nodule candidates • Adaptive multiple thresholding method. – the traditional multiple thresholding method makes many steps of grey levels. – We calculate the adaptive threshold value using diagonal histogram at every slice of lung volume. – This value is base threshold value for multiple thresholding. – We make additional five threshold values which are base threshold + 50, -50, -100, -150 and -200.
  • 11.
    Nodule Candidates Detection • We can remove the vessels and noise in the lung volume using rule based classifier. • Vessel Removing The vessel is classified by volume elongation factor and compactness. – volume is extremely bigger than nodule – longer than nodule – not compact object. • Noise removing – radius of ROI is smaller than 3mm or bigger than 30mm. • Remaining ROIs are nodule candidates 11 6-stepped ROI and extracted nodule candidates
  • 12.
    Feature Extraction •3D geometric features – Volume – elongation factor – Compactness – approximated radius. • 2D pixel based features. – Use median slice of nodule candidates (area of the median slice is the largest) – To extract 2D texture feature, we normalize the image size of nodule candidates. – 3 types of nodule sizes and then extract the features. • < 5mm : the size of image matrix is 8x8. • 5mm ~10mm : the size of image matrix is 16x16. • > 10 mm : the size of image matrix is 32x32 – extract 14 features from the image matrix. • mean, variance, skewness, kurtosis, area, radius and 8 biggest eigenvalues. 12
  • 13.
    Feature Extraction 13 Index Feature 1 Z position 2 Mean 3 Variance 4 Skewness 5 Kurtosis 6 Area 7 Radius 8 Perimeter 9 Compactness 10~17 Largest Eigenvalue 1~8 18 X centroid 19 Y centroid 20 Z centroid 21 Width 22 Height 23 Depth 24 Size
  • 14.
    Genetic Programming BasedClassification 14
  • 15.
    Genetic Programming BasedClassification • Genetic Programming (GP) – an evolutionary optimization technique [14]. • The basic structure of GP is very similar to Genetic Algorithm(GA). • The chromosome – GA : variable (binary digit) – GP : program (tree or graph) 15
  • 16.
    Genetic Programming BasedClassification 16 A function represented as a tree structure
  • 17.
    Genetic Programming BasedClassification • Our goal of GP evolution is to reduce false positive (FP) and increase true positive (TP). • In the proposed scheme, an optimized classifier is carried out using combination of features and random constant values. • GP optimally selects adequate features from all extracted features and combines the selected features with mathematical operators. • The GP generates individual classifiers and those are evaluated by fitness function. • The result of GP can convert complex input features to simple value. 17
  • 18.
    Genetic Programming BasedClassification • GP chromosome – The terminal set - The elements of feature vector extracted from nodule candidate images and randomly generated constants with in the range 0,1. – The function set - Four standard arithmetic operator namely plus, minus, multiply and division and additional mathematical operators log, exp, abs, sin and cos.(All operators in the function set are protected to avoid exception) • GP evolves combination of the terminal set and function set. 18
  • 19.
    Genetic Programming BasedClassification • Fitness Function – evaluate every individuals in GP generation – True positive rate (TPR) – Specificity (SPC) • SPC is the value subtracted from 1 to FPR and also called true negative rate(TNR). TN FP SPC FPR      – Area under the ROC curve (Az) • ROC curve is plotted between TP and FP for different threshold values. • Az is area under the ROC curve and a good measure of classifier performance in different condition. – Fitness Function 19 TP TPR TP FN   1 1 TN FP FP TN   f  TPR*SPC* Az
  • 20.
    Genetic Programming BasedClassification Objective To evolve maximum fitness Selection Generational Population Size 300 Generation Size 80 Initial Tree Depth Limit 6 Initial population Ramped half and half GP Operators prob Variable ratio of crossover mutation is used Sampling Tournament Survival mechanism Keep the best individuals Real max. tree level 30 Genetic Programming parameter 20
  • 21.
    Genetic Programming BasedClassification • Examples of GP – minus(minus(P(21,:),exp(P(23,:))),minus(mypower(mylog(plus(times(P(14 ,:),minus(P(23,:),mypower(mylog(plus(times(P(12,:),minus(P(11,:),mypow er(P(13,:),P(13,:)))),P(22,:))),P(13,:)))),minus(P(20,:),cos(exp(P(7,:)))))),mypo wer(exp(P(7,:)),P(7,:))),times(minus(minus(mypower(exp(P(23,:)),P(7,:)),P( 11,:)),times(exp(P(23,:)),P(12,:))),P(11,:)))) – minus(minus(plus(P(4,:),P(7,:)),sin(minus(P(7,:),mypower(P(24,:),plus(min us(P(7,:),P(11,:)),mypower(P(15,:),P(7,:))))))),mypower(mypower(mypower (P(24,:),exp(P(11,:))),minus(plus(P(10,:),mypower(plus(minus(P(13,:),sin(e xp(P(12,:)))),mypower(P(24,:),plus(P(13,:),P(4,:)))),plus(plus(minus(0.35089 ,0.35089),P(3,:)),P(7,:)))),P(11,:))),minus(plus(P(10,:),minus(P(4,:),plus(P(10,: ),P(7,:)))),mypower(minus(plus(P(4,:),exp(P(2,:))),minus(P(9,:),P(4,:))),P(11,: ))))) 21
  • 22.
    Experimental Results •Lung Image Database Consortium (LIDC) database [15] – to evaluate the performance of the proposed method. – The LIDC is developing a publicly available database of thoracic computed tomography (CT) scans as a medical imaging research resource to promote the development of computer-aided detection or characterization of pulmonary Nodules. – The database is separated into 84 cases, each containing around 100-400 Digital Imaging and Communication (DICOM) images and an XML data file containing the physician annotations • We applied our method to 32 scans consisting of 153 nodules and 7528 slices. The pixel size in the database ranged from 0.65 to 0.75 mm and the reconstruction interval ranged. • The half of dataset(16 scans) is used for training and another half of dataset(another 16 scans) is used for testing the classifier. 22
  • 23.
    Experimental Results (a)(b) The result of pulmonary nodule detection: (a) 43rd slice, (b) volume rendering 23
  • 24.
    Experimental Results Dataset TPR FPR Az learn 93.33% 0.127 0.934 test 91.67% 0.138 0.897 all 92.31% 0.133 0.912 24 The results of GP based classifier
  • 25.
    Experimental Results ROCcurves of GP based classifier with respect to three datasets 25
  • 26.
    Conclusion • Wehave proposed a novel pulmonary nodule detection algorithm in CT images. • Lung region is segmented using adaptive thresholding and voxel labeling based method. • Then nodule candidates are detected using adaptive multiple thresholding and rule based classifier with 3D geometric features. • Next, 3D and 2D features are extracted from the detected nodule candidates. • Finally, the extracted features are optimized and then classified into nodule and non-nodule using GP. • We applied proposed algorithm to the LIDC database of NCI. • This method extremely reduced FP rate. • The FPs per scan is only 6.5 with more than 90% sensitivity. • The results show the superiority of the proposed method. 26
  • 27.
    References • [1]Ahmedin Jemal, Rebecca Siegel, ElizabethWard, Yongping Hao, Jiaquan Xu, and Michael J Thun, “Cancerstatistics, 2009,” CA Cancer J Clin, vol. 59, no. 4, pp. 225–49, Jan 2009. • [2] K-W Jung, Y-J Won, S Park, H-J Kong, J Sung, H-R Shin, E-Cl Park, and J S Lee, “Cancer statistics in korea: incidence, mortality and survival in 2005,” J Korean Med Sci, vol. 24, no. 6, pp. 995–1003, Dec 2009. • [3] Qiang Li, “Recent progress in computer-aided diagnosis of lung nodules on thin-section ct.,” Comput Med Imaging Graph, vol. 31, no. 4-5, pp. 248–257, 2007. • [4] S G Armato, M L Giger, C J Moran, J T Blackburn, K Doi, and H MacMahon, “Computerized detection of pulmonary nodules on ct scans,” Radiographics, vol. 19, no. 5, pp. 1303–11, Jan 1999. • [5] M Brown, M McNitt-Gray, J Goldin, R Suh, J Sayre, and D Aberle, “Patient-specific models for lung nodule detection and surveillance in ct images,” IEEE TMI, vol. 20, no. 12, pp. 1242 – 1250, Dec 2001. • [6] K Suzuki, SG Armato III, F Li, S Sone, and K Doi, “Massive training artificial neural network (mtann) for reduction of false positives in computerized detection of lung nodules in low-dose computed tomography,” Medical physics, vol. 30, pp. 1602, 2003. • [7] D Paik, C Beaulieu, G Rubin, B Acar, R Jeffrey, J Yee, J Dey, and S Napel, “Surface normal overlap: a computer-aided detection algorithm with application to colonic polyps and lung nodules in helical ct,” IEEE TMI, vol. 23, no. 6, pp. 661 – 675, Jun 2004. • [8] Sumiaki Matsumoto, Harold L Kundel, James C Gee, Warren B Gefter, and Hiroto Hatabu, “Pulmonary nodule detection in ct images with quantized convergence index filter.,” Med Image Anal, vol. 10, no. 3, pp. 343–352, Jun 2006. 27
  • 28.
    References • [9]N Memarian, J Alirezaie, and P Babyn, “Computerized detection of lung nodules with an enhanced false positive reduction scheme,” IEEE ICIP 2006, pp. 1921 –1924, Sep 2006. • [10] Jamshid Dehmeshki, Xujiong Ye, Xinyu Lin, Manlio Valdivieso, and Hamdan Amin, “Automated detection of lung nodules in ct images using shape-based genetic algorithm.,” Comput Med Imaging Graph, vol. 31, no. 6, pp. 408–417, Sep 2007. • [11] Onur Osman, Serhat Ozekes, and Osman N Ucan, “Lung nodule diagnosis using 3d template matching.,” Comput Biol Med, vol. 37, no. 8, pp. 1167–1172, Aug 2007. • [12] A El-Baz, G Gimel’farb, R Falk, and M Abo El-Ghar, “Automatic analysis of 3d low dose ct images for early diagnosis of lung cancer,” Pattern Recognition, vol. 42, no. 6, pp. 1041–1051, Jan 2009. • [13] JJ Su´arez-Cuenca, PG Tahoces, M Souto, MJ Lado, M Remy-Jardin, J Remy, and J Jos´e Vidal, “Application of the iris filter for automatic detection of pulmonary nodules on computed tomography images,” Computers in Biology and Medicine, 2009. • [14] J Koza, “Genetic programming: On the programming of computers by means of natural selection,” The MIT Press, Jan 1992. • [15] S G Armato, G McLennan, M F McNitt-Gray, C R Meyer, D Yankelevitz, D R Aberle, C I Henschke, E A Hoffman, E A Kazerooni, H MacMahon, A P Reeves, B Y Croft, L P Clarke, and Lung Image Database Consortium Research Group, “Lung image database consortium: developing a resource for the medical imaging research community.,” Radiology, vol. 232, no. 3, pp. 739–748, Sep 2004. 28
  • 29.

Editor's Notes

  • #4 Pulmonary nodule detection is attractive applications of computer-aided detection (CAD) because lung cancer is the leading cause of cancer deaths in Korea. According to the statistics, the total number of deaths caused by lung cancer is greater than other cancers[1]. The pulmonary nodule detection and diagnosis of lesion in computed tomography (CT) images are important in treatment of lung cancer. If lung cancer detected in early phase, the 3-year survival rate is more than 80%. Recently, researchers have developed a number of CAD methods for lung nodules to aid radiologists in identifying nodule candidates from CT images. Current CT technology allows for near isotropic, submillimeter resolution acquisition of the complete chest in a single breath hold. These thin-slice chest scans have become indispensable in thoracic radiology, but have also substantially increased the data load for radiologists. Automating the analysis of such data is, therefore, a necessity and this has created a rapidly developing research area in medical imaging. In literature, several nodule detecting methods has been proposed. Multiple gray level thresholding, genetic algorithm template matching (GATM), rule-based linear discriminant analysis, massive training artificial neural network based method, shape-based GATM, and 3D template matching(3DTM) based algorithm are famous among them[2–5].
  • #5 GATM based algorithm performed quite good results. Lee et al.[2] proposed a template-matching technique based on GATM for detecting nodules existing within the lung area. Seventy-one nodules out of 98 were correctly etected with the number of FPs at approximately 1.1 per sectional image.
  • #8 First of all, the lung region extraction should be performed before any other part of nodule detection. To extract lung region, we propose a segmentation method based on adaptive thresholding and voxel labelling. Because lung region is dark, we convert the image to a binary with less than the selected threshold as foreground.
  • #9 After that, we remove the rim from the binary image at every slice of CT images. We segment lung region and remove the rim which is outer part of the body. However, there are many noisy parts likes gas in the intestine. So, we applied 18-connectedness voxel labelling(3D connected component labelling). After labelled, we calculate the volumes of the every connected components then select the two largest volumes as lung volume. In the end, we correct the contour of the lung volume because there may some nodules in wall side of the lung. To correct this problem, the rolling ball algorithm [4] is applied on every slice of lung volumes. The red circles in Fig. 1 are wall side nodule and corrected wall side nodule.
  • #10 The nodule candidates detection and feature extraction are important in nodule detecting scheme. It consists of the extracting the region of interest(ROI), detecting the nodule candidates and extracting of 3D and 2D features of nodule candidates. These features are provided as input for GP module.
  • #11 To extract ROI for nodule candidates detection, we propose a adaptive multiple thresholding method. In the literature, the multiple thresholding method is commonly used in applications. However, it is not adaptive and makes many steps of grey levels, we calculate the adaptive threshold value using diagonal histogram at every slice of lung volume. This value is base threshold value for multiple thresholding and we make additional five threshold values which are base threshold + 50, -50, -100, -150 and -200. A thresholded lung volume has 6 steps of grey level. Fig. 2a shows a slice of extracted ROI.
  • #12 In this part, We can remove the vessels and noise in the lung volume using rule based classifier. We extracted 3D geometric features from every ROI. The features are volume, elongation factor, compactness and approximated radius. The vessel is classified by volume elongation factor and compactness. The vessel is connected every slice so its volume is extremely bigger than nodule. Moreover, it is longer than nodule and not compact object. The noise is removed if radius of ROI is smaller than 3mm or bigger than 30mm. Fig. 3b shows a detected nodule candidates.
  • #13 Nodule candidates are detected from segmented lung region. We extracte 3D geometric features and 2D pixel based features. Four 3D geometric features are already extracted in nodule candidates detection. These are volume, elongation factor, compactness and approximated radius. The 2D features are extracted from median slice of nodule candidates because area of the median slice is the largest in the nodule candidate volume. To extract 2D texture feature, we normalize the image size of nodule candidates. So, we divide into 3 types of nodule sizes and then extract the features. If the radius of nodule candidate is less than 5mm, the size of image matrix is 8x8. If the radius of nodule candidate is varied form 5mm to 10mm then the size of image matrix is 16x16. The largest size of image matrix is 32x32 that is greater than 10mm and less than 20mm. We extract 14 features from the image matrix. Those are mean, variance, skewness, kurtosis, area, radius and 8 biggest eigenvalues. These features are provided as input for GP module
  • #14 Nodule candidates are detected from segmented lung region. We extract 3D geometric features and 2D pixel based features. Four 3D geometric features are already extracted in nodule candidates detection. These are volume, elongation factor, compactness and approximated radius. The 2D features are extracted from median slice of nodule candidates because area of the median slice is the largest in the nodule candidate volume. To extract 2D texture feature, we normalize the image size of nodule candidates. So, we divide into 3 types of nodule sizes and then extract the features. If the radius of nodule candidate is less than 5mm, the size of image matrix is 8x8. If the radius of nodule candidate is varied form 5mm to 10mm then the size of image matrix is 16x16. The largest size of image matrix is 32x32 that is greater than 10mm and less than 20mm. We extract 14 features from the image matrix. Those are mean, variance, skewness, kurtosis, area, radius and 8 biggest eigenvalues. These features are provided as input for GP module
  • #15 The nodule candidates detection and feature extraction are important in nodule detecting scheme. It consists of the extracting the region of interest(ROI), detecting the nodule candidates and extracting of 3D and 2D features of nodule candidates. These features are provided as input for GP module.
  • #16 The pulmonary nodule detection is a binary classification problem. In a binary classification problem, the outputs are labelled as positive or negative. The positive is nodule and the negative is non-nodule. In pulmonary nodule detection, the almost nodule candidates are truly negative. So, It makes many false positives. The goal of GP evolution is to reduce false positive(FP) and increase true positive(TP). In the proposed scheme, an optimized classifier is carried out using combination of features and random constant values. It reduced FP while higher TP rate.
  • #18 The goal of GP evolution is to reduce false positive (FP) and increase true positive (TP). In the proposed scheme, an optimized classifier is carried out using combination of features and random constant values. GP optimally selects adequate features from all extracted features and combines the selected features with mathematical operators. The GP generates individual classifiers and those are evaluated by fitness function. The result of GP can convert complex input features to simple value. This value is easily classified into nodule and non-nodule.
  • #20 ROC curve is plotted between TP and FP for different threshold values. Az is area under the ROC curve and a good measure of classifier performance in different condition. If we use only Az as a fitness function, we also achieve good TP and FP. However, GP can not produce the proper classification threshold. Therefore, we also use true positive rate (TPR) and specificity (SPC) as parts of fitness function. We used specificity (SPC) instead of false positive rate(FPR) because it is good at low value but other indicators are good at high value. SPC is the value subtracted from 1 to FPR and also called true negative rate(TNR). The fitness function is defined as the product of three indicators. In GP cycle, the fitness function evaluates the quality of each individual(classifier). In this work, we used three indicators as a fitness function. These are area under receiver operating characteristic (ROC) curve (Az), sensitivity and specificity.
  • #21 GP evolution is controlled by parameters as shown in Table.1. All parameters are set to the general values. We used ramped half and half method to generate initial population. The output of GP is real value. We need wrapper to simplify the output of GP. If output of GP is positive, the nodule candidates is classified as nodule otherwise non-nodule. Finally, if number of generations reaches the maximum limit, GP run is stopped. The best individual is obtained at the end of GP run.
  • #22 GP evolution is controlled by parameters as shown in Table.1. All parameters are set to the general values. We used ramped half and half method to generate initial population. The output of GP is real value. We need wrapper to simplify the output of GP. If output of GP is positive, the nodule candidates is classified as nodule otherwise non-nodule. Finally, if number of generations reaches the maximum limit, GP run is stopped. The best individual is obtained at the end of GP run.
  • #25 The sensitivity of nodule candidate detection is about 100% and its FP rate is 0.9. The nodule candidates have many FPs. The results in Table. 2 show the nodule detection rate(TPR), FPR and Az with respect to three types of datasets. FP rates of three datasets are about 10% of FP rate without GP.
  • #26 The ROC curves of the datasets are shown in Fig. 4. The proposed method achieved 92% detection rate with 6.5 FPs per scan.
  • #27 We have proposed a novel pulmonary nodule detection algorithm in CT images. Lung region is segmented using adaptive thresholding and voxel labelling based method. Then nodule candidates are detected using adaptive multiple thresholding and rule based classifier with 3D geometric features. Next, 3D and 2D features are extracted from the detected nodule Candidates. Finally, the extracted features are optimized and then classified into nodule and non-nodule using GP. We applied proposed algorithm to the LIDC database of NCI. This method extremely reduced FP rate. The FPs per scan is only 6.5 with more than 90% sensitivity. The results show the superiority of the proposed method.