Upcoming SlideShare
×

# CONTENTS

• 900 views

• Comment goes here.
Are you sure you want to
Be the first to comment
Be the first to like this

Total Views
900
On Slideshare
0
From Embeds
0
Number of Embeds
0

Shares
22
0
Likes
0

No embeds

### Report content

No notes for slide

### Transcript

• The image in RGB format is converted to a grayscale form.
• 2. The image is then subjected to histogram equalization.
3.1 Histogram Equalization<br />Histogram equalization improves the quality of the image considerably. This technique reduces the extra brightness and darkness in the images. The distinct features, of the image, are enhanced by increasing the contrast range. Histogram equalization is the technique by which the dynamic range, of the histogram image, is increased [10].<br />The intensity values of the pixels, in the input image, are assigned such that, the output image contains a uniform distribution of intensities. Histogram equalization results in uniform histogram and hence the contrast of the image is increased. <br />Histogram equalization is operated, on an image, in the following steps: <br />
• Histogram formation.
• 3. Calculation of new intensity values, for each intensity levels.
• 4. Replacing the previous intensity values with, the new intensity values.
• Initialize the meansco variancesand mixing coefficients.
• 5. Evaluate the initial value of the log likelihood.
• 6. Evaluate the responsibilities using the current parameter values:
The steps to carry out the EM algorithm, (M step) are as follows:<br />
• Initialize the meanscovarianceand mixing coefficients.
• 7. Evaluate the initial value of the log likelihood.
• 8. Re-estimate the parameters using the current responsibilities:
, where,<br />
• Evaluate the log likelihood and check for convergence of either the parameters or the log likelihood:
• 9. If the convergence criterion is not satisfied, return to the E step.
The EM algorithm takes more iteration to reach convergence and takes more time as compared to the K-means algorithm. Hence, it is common to use the K-means algorithm to find the initial estimates of the parameters from the training data. <br />The K-means algorithm uses the squared Euclidean distance as the measure of dissimilarity between a data point and a prototype vector. This not only limits the type of data variables to be considered but also makes the determination of the cluster means non-robust to the outliers [3, 32]. This algorithm chooses randomly the initial means and unit variances for the diagonal covariance matrix which is being adopted in the current work.<br />5.3 MATLAB<br />MATLAB is a high-level technical computing language and an interactive environment for algorithm development, data visualization, data analysis, and numeric computation. Using the MATLAB product, technical computing problems can be solved faster, than with traditional programming languages, such as C, C++, and FORTRAN [17, 18]. MATLAB can be used in a wide range of applications, including signal and image processing, communications, control design, test and measurement, financial modeling and analysis, and computational biology [21, 24]. The software for feature extraction and image classification are written in MATLAB 7.5.0.342 (R2007b).<br />The required tools for the success of the project are MATLAB computing software with image processing toolbox, SVM and GMM toolbox, and Microsoft Excel platform. The preprocess images were subsequently processed by a MATLAB Graphic User Interface (GUI) in sequence from radon transform to HOS followed by Fischer Discriminant Analysis (FDA) [16, 19]. <br />A GUI is a type of user interface item that allows people to interact with programs in more ways than typing [15, 20]. A GUI offers graphical icons and visual indicators, as opposed to text-based interfaces, typed command labels or text navigation to fully represent the information and actions available to a user. The actions are usually performed through direct manipulation of the graphical elements.<br />Figure 5.1: An of example GUI.<br />CHAPTER 6: RESULTS<br />ParameterNormalBenignCancerP-ValueP(1/19)-0.864 ± 1.280.1516 ± 1.79-0.0376 ± 1.610.0002P(2/19)1.0981 ± 1.571.1265 ± 1.61.4368 ± 1.060.4P(3/19)1.5089 ± 1.421.8151 ± 0.981.6316 ± 1.130.29P(4/19)0.3443 ± 1.740.4227 ± 1.780.0970 ± 1.920.6P(5/19)-0.7632 ± 1.01-0.3759 ± 1.19-0.5603 ± 1.370.12P(6/19)-0.7471 ± 1.23-0.5153 ± 1.15-0.7047 ± 1.30.47P(7/19)0.4572 ± 1.820.3982 ± 1.910.8911 ± 1.680.29P(8/19)1.4963 ± 1.291.7038 ± 1.021.7279 ± 1.070.42P(9/19)1.2298 ± 1.211.3805 ± 1.051.0564 ± 1.510.36P(10/19)0.864 ± 1.28-0.1516 ± 1.790.0376 ± 1.610.0002P(11/19)0.7716 ± 1.340.2165 ± 1.870.2076 ± 1.740.063P(12/19)0.6337 ± 1.840.2321 ± 2.070.4104 ± 1.950.44P(13/19)-1.3259 ± 1.44-1.2354 ± 1.39-0.9542 ± 1.720.38P(14/19)-1.648 ± 0.978-1.4667 ± 0.88-1.5978 ± 1.010.48P(15/19)-1.362 ± 1.1-1.3814 ± 1.07-1.5038 ± 1.10.75P(16/19)-0.8233 ± 1.56-0.9813 ± 1.53-1.1006 ± 1.440.58P(17/19)0.1656 ± 2.040.2886 ± 2.020.3588 ± 1.990.86P(18/19)-0.8224 ± 1.710.3290 ± 1.720.3787 ± 1.59<0.0001P(19/19)-0.864 ± 1.280.1516 ± 1.79-0.0376 ± 1.610.0002<br /> <br />Table 6.1: Ranges of input features to the classifiers.<br />ClassesTrainingTestingResultsNormal602019Benign552016Cancer351515Average91.67%<br />Table 6.2: Results of SVM classifier<br />ClassesTrainingTestingResultsNormal602019Benign552016Cancer351513Average86.67%<br />Table 6.3: Results of GMM classifier<br />Table 6.1 shows the ranges of the 19 features used to feed as input to the SVM and GMM. For the purpose of training and testing the classifier, a database of 205 subjects is divided into two sets, a training set of 150, 60 normal, 55 benign and 35 cancer, samples and a test set of 55, 20 normal, 20 benign and 15 cancer, samples. This can be seen in tables 6.2 and 6.3. These samples were arbitrarily divided.<br />ClassifiersTNTPFPFNSensitivitySpecificityPositive Predictive AccuracySVM19311488.57%95%96.88%GMM19291683.33%93.33%96.67%<br />Table 6.4: Sensitivity, specificity and positive predictive accuracy (PPA) of SVM and GMM classifiers<br />TP = number of true positive specimens.FP = number of false positive specimens.FN = number of false negative specimens.TN = number of true negative specimens.<br />The sensitivity of a test is the probability that it will produce a TP result when used on an infected population. The sensitivity of a test can be determined by calculating [44],<br />The specificity of a test is the probability that a test will produce a TN result when used on a non-infected population. The specificity of a test can be determined by calculating,<br />The PPA of a test is the probability that a person is infected when a positive test result is observed. The PPA of a test can be determined by calculating,<br />The SVM classifier is able to classify the three stages with an average accuracy of 91.67%; this is shown in table 6.2. This classifier shows sensitivity of 88.57%, specificity of 95% and PPA of 96.88%, this can be seen in table 6.4. It can also be seen, from table 6.2, that one normal case was wrongly classified as abnormal and that, SVM was able to classify the cancer class better than the other classes. <br />As for the GMM classifier, it is able to classify the three stages with an average accuracy of 86.67%; this is shown in table 6.3. This classifier shows sensitivity of 83.33%, specificity of 93.33% and PPA of 96.67%, this can be seen in table 6.4. It can also be seen, from table 6.3, that one normal case was also wrongly classified as abnormal and that, GMM is able to classify the normal class better than the other classes. <br />It can be seen, from table 6.4, that SVM classifier has higher sensitivity, specificity and PPA, compared to the GMM classifier.<br />CHAPTER 7: DISCUSSION<br />One of the commonly missed signs of breast cancer is architectural distortion. Fractal analysis and texture measures for the detection of architectural distortion in screening mammograms taken prior to the detection of breast cancer have been applied. Gabor filters, phase portrait modeling, fractal dimension (FD) and texture features for the analysis have been used.<br />The classification of the three kinds of mammograms, normal, benign and cancer were studied. The features were extracted from the raw images using the image processing techniques and fed to the two classifiers, SVM and GMM, for comparison. Sensitivity and specificity of more than 90%, for these classifiers, were demonstrated.<br />Supervised and unsupervised methods of segmentation in digital mammograms showed higher misclassification rates when only intensity was used as the discriminating feature. However, with additional features such as window means and standard deviations, methods such as the k-nearest neighbor (k-nn) algorithm were able to significantly reduce the number of mislabeled pixels with respect to certain regions within the image. <br />A method for detection of tumor using Watershed Algorithm, and further classify it as benign or malignant using Watershed Decision Classifier (WDC) was proposed before. Their experiment results show that the proposed method was able to classify into benign and malign tumor with an accuracy of 88.38%.<br />Fast detection method of circumscribed mass in mammograms employing a radial basis function neural network (RBFNN) was presented before. This method distinguishes between tumorous and healthy tissue among various parenchyma tissue patterns, and makes a decision whether a mammogram is normal or not, and then detects the masses' position by performing sub-image windowing analysis. <br />Multi-scale statistical features of the breast tissue were evaluated and fed into the RBFNN to find the exact location and classification. The RBFNN was able to classify average of 75% of the unknown class correctly.<br />In this work, I have used HOS features from the mammogram images to classify the breasts abnormalities into the three classes. I was able to classify up to 91.67% accurately. <br />The accuracy of the system can be further increased by increasing the size and quality of the training data set. It can also be increased by having more HOS features, i.e. decreasing the angle from 20º, to 10º or 5º in the radon transform. <br />The classification results can be enhanced by extracting the proper features from the mammogram images. The environmental conditions like, the reflection of the light influences the quality of the images and hence the classification efficiency.<br />CHAPTER 8: CONCLUSION<br />I have developed an automated technique for the quantitative assessment of breast density from digitized mammograms using HOS and data mining techniques. I have extracted 19 bispectrum invariant features from the mammograms. These features capture the variation in the shapes and contours in those images. They are fed into SVM and GMM classifiers as the diagnostic tool to aid in the detection of the breast abnormalities. <br />However, these computer-aided tools generally do not yield results of 100% accuracy. The accuracy of the tools depend on several factors such as, the size and quality of the training data set, the rigor of the training imparted and also the input parameters chosen. <br />I can conclude that SVM provides a higher accuracy, than GMM, in classification. My SVM classification system produces encouraging results with more than 90% for combined sensitivity and specificity rates. However, with better features, the robustness of the diagnostic systems can be improved to detect the breast cancer at the early stages and hence, save lives.<br />CHAPTER 9: CRITICAL REVIEW<br />9.1 Criteria and Targets<br />Criteria/TargetsDatelineCompletionProject Selection and Approval10 Jan 2009AchievedDrafting a Project Plan with Supervisor02 Feb 2009AchievedGathering of Information15 Mar 2009AchievedSummarising and Collating all Information23 Mar 2009AchievedLiterature Review28 Sep 2009AchievedProposal Report Submission27 Feb 2009AchievedFamiliarising with MATLAB22 Mar 2009AchievedSelecting and Implementing Methods and Techniques12 Apr 2009AchievedInterim Report Submission30 Apr 2009AchievedCarrying out Software Investigation10 May 2009AchievedTesting and Implementing21 Jun 2009AchievedEvaluating Software09 Aug 2009AchievedThesis and Poster Submission09 Nov 2009AchievedOral Presentation28 Nov 2009On GoingPoster Presentation28 Nov 2009On Going<br />Table 9.1: Criteria/Targets and achievements<br />9.2 Project Plan<br />9.3 Strengths and WeaknessesStrengths:<br />
• Familiar with MATLAB programming through Digital Systems and Signal Processing, Biomedical Instrumentation, Linear Systems Analysis and Filter Theory Design modules in UniSIM.
• 10. Exposed to programming during Final Year Project FYP in Temasek Polytechnic (TP). It was also a joint project with the National Kidney Foundation (NKF) of Singapore. This gave me exposure to research involving human subjects.
• 11. Familiar with image processing through Image Analysis and Medical Imaging and Visualisation modules in UniSIM.
• 12. Learnt presentation and report writing skills through Technical Communication Skills in TP and Biomedical Ethics and Communication modules in UniSIM.
Weaknesses:<br />
• No experience in employing HOS and data mining techniques.
• 13. No experience in writing thesis as FYP, in TP, only consists of report writing.
• 14. Time constrain due to tight work schedule and also studying of other modules in UniSIM
9.4 Priorities for Improvement<br />
• Maximise time management by getting priorities right in life.
• 15. Make use of time between May to July Semester break to fully utilise for capstone.
• 16. Read up and practice more to familiarise with HOS, SVM, GMM and MATLAB techniques and applications.
• 17. Read up on considerable amount of work done by others, in this same field worldwide, in order to justify that this project can be added to the body of knowledge.
• Archarya U.R., Wong L.Y., Ng E.Y.K., Suri J., “Automatic identification of anterior segment eye abnormality”, Innovations and Technology in Biology and Medicine (ITBM-RBM) France, 2007 (In Press).
• 18. Archaya U.R., Chua K.C., Ng E.Y.K., Yu W., Chee C., “Application of Higher Order Spectra for the Identification of Diabetes Retinopathy Stages”, Mar, 2008.
• 19. Archaya U.R., Ng E.Y.K., Chang Y.H., Yang J., Kaw G.J.L., “Computer-Based Identification of Breast Cancer Using Digitized Mammograms”, Apr, 2008.
• 20. Byng J.W., Yaffe M.J., Jong R.A., Shumak R.S., Lockwood G.A., David M.M., Tritchler D.L., Boyd N.F., “Analysis of mammographic density and breast cancer risk from digitized mammograms”, Radio graphics 18, 1998, 1587–98.
• 21. Cahoon T.C., Sutton M.A., Bezdek J.C., “Breast cancer detection using image processing techniques”, The Ninth IEEE International Conference on Fuzzy Systems, 2, 2000, 973-976.
• 22. Chandran V. C., Carswell B., Boashash B., Elgar S.L., “Pattern Recognition Using Invariants Defined from Higher Order Spectra: 2-D Image Inputs”, IEEE Transactions on image processing, 6, pp. 703-712, 1997.
• 23. Chandran V., Elgar S.L., “Pattern recognition using invariants defined from higher order spectra one-dimensional inputs”, IEEE Transactions on Signal Processing, 41, pp. 205-212, 1997,
• 24. Davis L.S., “Shape matching using relaxation techniques”, IEEE Trans. Patt. Anal. Machine Intell. pp. 60-72, Jan. 1979.
• 25. Dempstar P., Laird N.M., and Rubin D.B., “Maximum Likelihood from Incomplete Data Using the EM Algorithm,” Journal of the Royal Society of Statistics, 39, 1, pp. 1-38, 1977.
• 26. Gonzalez R.C., Wintz P., “Digital Image Processing”, 3rd edn. Addison-Wesley Reading MA, 2002.
• 27. http://en.wikipedia.org/wiki/International_Agency_for_Research_on_Cancer [last accessed on 28th Oct 2009].
• 28. http://en.wikipedia.org/wiki/World_Health_Organization [last accessed on 28th Oct 2009].
• 29. http://en.wikipedia.org/wiki/Radon_transform [last accessed on 28th Oct 2009].
• 30. http://marathon.csee.usf.edu/Mammography/Database.html [last accessed on 28th Oct 2009].
• 31. http://users.rowan.edu/~shreek/networks1/MATLABintro.html#File Input/output [last accessed on 28th Oct 2009].
• 32. http://www.math.upenn.edu/~cle/m584/MATLAB/ws2.html [last accessed on 28th Oct 2009].
• 37. http://www.mathworks.com/products/MATLAB/description1.html?s_cid=ML_a1008_desintro [last accessed on 28th Oct 2009].
• 38. http://www.medcalc.be/ [last accessed on 28th Oct 2009].
• 39. http://www.physics.csbsju.edu/stats/anova.html [last accessed on 28th Oct 2009].
• 40. http://www.psatellite.com/matrixlib/api/MATLAB.html [last accessed on 28th Oct 2009].
• 41. Karlikowske K., Grady G., Rubin S.M., Sandrock C., Ernster V.L., “Efficacy of screening mammography: a meta analysis”, JAMA, 1995, 273, 149-154.
• 42. Kurz L., and Benteftifa M.H., “Analysis of Variance in Statistical Image Processing”, Cambridge University Press, July 1997.
• 43. Martin K.D., Kim Y.E., “Musical Instrument Identification: A Pattern- Recognition Approach,” presented at the 136th Meeting of the Acoustical Society of America, October, Norfolk, VA., 1998.
• 44. Nandi A.K., Rob, “Estimation of third order cumulants in application of higher order spectra”, IEE Proceedings, Part F, vol. 140, no. 6, pp. 380-389, 1993
• 45. Nikias C.L., Petropulu A.P., “Higher-order spectra analysis: a nonlinear signal processing framework”, Englewood Cliffs, N.J.: PTR Prentice Hall, 1993.
• 46. Nikias C.L., Raghuveer M.R., “Bispectrum Estimation: A digital signal processing framework”, Proc. IEEE, 75(7): 867-891, (1987).
• 47. Oppenheim A., Kim J., “The importance of phase in signals”, Proc.IEEE, 69, pp. 529-541, 1981.
• 48. Seo C., Lee K.Y., Lee J., "GMM based on local PCA for speaker identification," Electronics Letters, 37, pp. 1486-1489, 2001
• 49. Udantha R.A., Athina P.P., John M.R., “Higher Order Spectra Based Deconvolution of Ultrasound Images”, May 1995.
• 50. Vinod C., Daryl N., Sridha S., "Speaker Identification Using Higher Order Spectral Phase Features and their Effectiveness vis–a–vis Mel-Cepstral Features", July 2004
• 51. Yuan S., Mehmet C., "Higher-order spectra (HOS) invariants for shape recognition", Sep 2000
• 52. Za¨ıane O.R., Antonie M.L., Coman A., “Mammography classification by an association rule-based classifier”, International Workshop on Multimedia Data Mining, 2002.
• 53. Zheng B., Mengdao X., "Logarithm bispectrum-based approach to radar range profile for automatic target recognition", Nov 2001