Successfully reported this slideshow.

Model

291 views

Published on

Model

  1. 1. Using Some Data Mining Techniques for EarlyDiagnosis of Lung CancerDr. Zakaria Suliman Zubi, Miss Rema Asheibani Saad,Associate Professor , M.Sc. Student,Computer Science Department, Computer Science Department,Faculty of Science , Faculty of Science ,Sirte University, Alfateh University,Sirte ,Libya. Tripoli ,Libya.
  2. 2. Paper Outline Introduction. Objectives of the paper. Data Mining Task. Data Set. Methods and Models. Conclusion.
  3. 3. Introduction Lung cancers classified into two main categories: – Small-cell lung cancers (SCLC), which report for about 20% of cases. – Non-small-cell lung cancers (NSCLC), which report for the other 80%. Non-small-cell lung cancers include squamous cell carcinomas (35% of all lung cancers), adenocarcinomas (27%) and large cell carcinomas (10%). Medical images as an essential part of medical diagnosis and treatment . Images include both projection X-Ray chest film and cross-sectional images. Lung X-ray chest films considered as the most reliable method in early detection of lung cancers. Image recognition mining deals with the extraction of image patterns from a large collection of images stored in particular multimedia databases. Data mining will be used to classify the digital X-ray chest films in two categories: normal and abnormal.
  4. 4. Paper Outline Introduction. Objectives of the paper. Data Mining Task. Data Set. Methods and Models. Conclusion.
  5. 5. Objectives of the paperThe aims of this paper is to pointe out the following: Using some data mining, techniques such as neural networks and association rule mining techniques to detect early Lung Cancer and to classified it by using X-Ray chest films image. We will use the data of 300 x-ray chest films as a dataset in our classification system. To classify the digital X-ray chest films into two categories: normal and abnormal. The normal ones are those characterizing a healthy patient. The abnormal ones could have any type of lung cancer. Helping physicians to decide an important decisions on a particular patient state.
  6. 6. Paper Outline Introduction. Objectives of the paper. Data Mining Task. Data Set. Methods and Models. Conclusion.
  7. 7. Data Mining Task Data Preprocessing: Feature Extraction: Rule Generation:
  8. 8. Data Preprocessing 8 Normalization techniques are necessary to combine the different image formats to a regular format. Data preparation modifies images to present them in an appropriate format for transformation techniques. The image will be transformed in order to obtain a compressed (lossless) representation of it. The segmentation step finds consequent regions within an image,since item sets are extremely large .
  9. 9. 9 Data Cleaning is the process of cleaning the data by removing noise. The first step toward noise removal was pruning the images with the help of the crop operation in Image Processing. image after apply cropping operation image before apply preprocessing techniques
  10. 10. 10 Histogram Equalization increases the contrast range in an image by increasing the dynamic range of grey levels. and at the same time highlights the image features. The noise removal step was necessary before this enhancement because, otherwise, it would also result in enhancement of noise. The median filter is a nonlinear filter, used in order to eliminate the high-frequency filter without eliminating the significant characteristics of the image.
  11. 11. 11 We used 3x3 mask, which is centered on each image pixel, replacing each central pixel by the mean of the nine pixels covering the mask. image after apply image after cropping operation apply median filter
  12. 12. 12 As the low-contrast images histogram is narrow and centred towards the middle of the gray scale, by distributing the histogram to a wider range will improve the quality of the image. image after apply Histogram Equalization Histogram of the image after apply histogram equalization
  13. 13. Data Mining Task Data Preprocessing: Feature Extraction: Rule Generation:
  14. 14. 14Feature Extraction Images usually have a huge number of features, there are optimal feature set from a large number of features. The feature extraction phase is required in order to build the transactional database to be mined. Image processing algorithms used, which automatically extract image attributes such as local color, global color, texture, and structure. We localize the extraction process to very small regions in order to ensure that we capture all areas.
  15. 15. 15Our system consists of four processing steps:1. Location of tumor candidates by using adaptive ring filter.2. Extraction of the boundaries of tumor candidates.3. Extraction of feature parameters.4. Discrimination between the normal and the abnormal regions.
  16. 16. 16 The boundary of the candidate is predictable by using a two- step process: In the first step, Iris filter is used to approximation the fuzzy boundary. Then; SNAKES algorithm is applied to the output image of the Iris filter to obtain the boundary of the tumor candidate. It is called a suspicious region (SR).lung region on chest x ray after applied Iris filter.
  17. 17. 17 Adaptive ring filter is working to extract tumor candidates. lung region on chest x rayafter applied Adaptive ring filter The discrimination between the normal and the abnormal regions is performed using a statistical method such as distance measure.
  18. 18. Data Mining Task Data Preprocessing: Feature Extraction: Rule Generation:
  19. 19. Rules Generation 19 In an extremely knowledge based domain associated with a domain knowledge we can progress some data-mining tasks to improve decision support services. This data integration is an important notion because medical images are not self contained. We suppose that association rules of two forms such as: – Image contents dissimilar to spatial relationships, e.g., if an image has a texture X, that it is likely containing protrusion Y; – Image contents associated to spatial relationships, e.g., if X is among Y and Z it is likely there then there is a T beneath. A low minimum support and high minimum confidence is pleasing, since few image datasets have high support value.
  20. 20. Paper Outline Introduction. Objectives of the paper. Data Mining Task. Data Set. Methods and Models. Conclusion.
  21. 21. 21 Data SetIn our paper, we considered the 300 x-ray chest filmsmultimedia database as a training dataset used in ourclassification system. We also consider 70 percent as a training value of the systems and 10 percent for testing it. Ten splits of the data collection are considered to compute all the results in order to obtain a more accurate result of the system potential.
  22. 22. Paper Outline Introduction. Objectives of the paper. Data Mining Task. Data Set. Methods and Models. Conclusion.
  23. 23. 23 Methods and Models There are several kinds of data mining methods; some of the major data mining methods are known as deviation detection, summarization, classification, generalized rule induction and clustering. In this paper work, we use classification and generalized, neural network and association rule mining . This methods aims to identify the characteristics that indicate the group to which each case belongs.
  24. 24. 24 Classification and Generalized Data mining generates classification models by investigative already classified data (cases) and inductively finding a predictive pattern. In recent years, many advanced classification approaches, such as neural networks, fuzzy-sets, and expert systems, have been widely applied for image classification. In most cases, image classification approaches grouped as supervised and unsupervised machine learning approaches, or parametric and nonparametric, or hard and soft (fuzzy) classification, or per-pixel, subpixel, and per field. The most regularly used non-parametric classification approaches are neural networks, decision trees, support vector machines, and expert systems.
  25. 25. 25 Classification and Generalized Neural Network Association Rule Mining
  26. 26. Neural Network 26Especially, the neural network approach has been widelyadopted in recent years. The neural network has several advantages, including its nonparametric nature, arbitrary decision boundary capability, easy adaptation to different types of data and input structures,… etc Neural networks are of particular interest because they offer a means of efficiently modeling large and complex problems in which there may be hundreds of predictor variables that have many interactions. Neural nets may used in classification problems (where the output is a categorical variable) or for regressions (where the output variable is continuous).
  27. 27. 27ANN is capable technique in image recognition. In our work we will use the Feed forward neural network topology, because normally only feed- forward networks are used for pattern recognition. feature that extracted from image sends to neural network.
  28. 28. 28
  29. 29. 29 Classification and Generalized Neural Network Association Rule Mining
  30. 30. Association Rule Mining 30 Association rule mining has been commonly investigated in data mining literatures. Many proficient algorithms have been proposed, one of the most popular algorithms are called Apriori and FP-Tree growth Association rule mining. " Given a set of transactions D = {T1,…… ,Tn} and a set of items I = {I1,I2……., im} such that any transaction T in D is a set of items in I, an association rule is an implication A → B where the antecedent A and the consequent B are subsets of a transaction T in D, and A and B have no common items. For the association rule to be acceptable, the conditional probability of B given A has to be higher than a threshold called minimum confidence "
  31. 31. 31 Association rules mining is usually a two-step process: 1. Wherein the first step frequent item-sets are discovered (i.e. item-sets whose support is no less than a minimum support) . 2. The second step, association rules derived from the frequent item sets. Most algorithms used to identify large itemsets can classify as either sequential or parallel. Most algorithms used to identify large itemsets can classify as either sequential or parallel.
  32. 32. 32In our approach, we used the apriori algorithm in order todiscover association rules among the features extracted fromthe x-ray chest films database and the category to which eachx-ray chest films belongs too. The Apriori algorithm called also as "Sequential Algorithm" developed by [Agrawal1994] ,It is also the most well known association rule algorithm. This technique uses the property that any subset of a large itemset must be a large item set it self. The primary differences of this algorithm from the AIS and SETM algorithms are the way of producing candidate itemsets and the selection of candidate itemsets for counting.
  33. 33. Paper Outline Introduction. Objectives of the paper. Data Mining Task. Data Set. Methods and Models. Conclusion.
  34. 34. 34 Conclusion In this paper, we used some data mining, techniques such as neural networks and association rule mining techniques to detect and classify Lung Cancer in X-Ray chest data films. We considered the 300 x-ray chest films multimedia database as a training dataset used in our proposed classification system. From these set of images we considered 70 percent as a training value of the systems and 10 percent for testing it. Ten splits of the data collection will be considered to compute all the results in order to obtain a more accurate result of the system potential. We classified the digital X-ray chest films in two categories: normal and abnormal. We used some procedures as a essential part to the task of medical image mining, these procedures are: Data Preprocessing, Feature Extraction and Rule Generation. In this paper also, we used classification and generalized, neural network and association rule mining induction methods in order to classify problems aim to identify the characteristics that indicates the group to which each case belongs to.
  35. 35. 35
  36. 36. 36Thank you !!!
  37. 37. 37

×