SlideShare a Scribd company logo
1 of 4
Integrated Intelligent Research (IIR) International Journal of Data Mining Techniques and Applications
Volume: 04 Issue: 02 December 2015 Page No.63-66
ISSN: 2278-2419
63
A Pioneering Cervical Cancer Prediction
Prototype in Medical Data Mining using
Clustering Pattern
R.Vidya1
,G.M.Nasira2
1
Research scholar of M.S.University,Tirunelveli, Assistant Professor /Department of Computer Science
St.Joseph’s college, Cuddalore, India
2
Assistant Professor/Department of Computer Science, ChikkannaGovernment College for Women, Tirupur, India
Email: vidya.sjc@gmail.com, nasiragm99@yahoo.com
Abstract—Let us not make the cure of the disease more
unbearable than the disease itself this quote is the most durable
and inspirational line of medicine field. Data mining is said to
be an umbrella term which refers to the progression of finding
out the patterns in data. This can be even succeeded typically
with an assistance of authoritative algorithm to automate
search (as a part). This paper reveals out, how the C2P
(Cervical Cancer Prediction) model is approached by a data
mining algorithm for prediction. The prediction of C2
(Cervical Cancer) has been a challenging problem in research
field. In the Data mining applications, we are utilizing RFT
(Random Forest Tree) algorithm to do the prediction. To the
best of our knowledge, we use popular clustering K-means
technique to achieve more accuracy.
Keywords— Cervical Cancer, Random Forest Tree, K-means
learning, Data mining, Clustering K-means.
I. INTRODUCTION
Cancer of the cervix is one of the most prevalent forms of
cancer worldwide, the major burden of the disease being felt in
developing countries like India. Cervical carcinoma still
continues to be the most common cancer among women and
accounts for the maximum cancer deaths each year. Persistent
infections with High-Risk (HR) Human Papilloma viruses,
such as HPV 16,18,31,33 and 45 have been identified as a
major development of the distance [8]. This model
interpretability and prediction accuracy provided by Random
Forest is very unique among popular machine learning
methods. Accurate predictions and better generalizations are
achieved due to utilization of ensemble strategies and random
sampling.Random Forest three main features that gained focus
are: (i) Accuratepredictions results for a variety of
applications. (ii) Through model training, the importance of
each feature can be measured (iii) Trained model can measure
the pair-wise proximity between the samples. The research of
our paper is as followed by cervical cancer and diseases and
then proceeded by medical data mining and RFT where we
deal with K-Mean algorithm in the next phase. Followed by
this we have given out our C2
P Model along with how to
prevent cancer. Later the paper concludes by Result and
conclusion. Characterized by abnormal bleeding, pelvic pain
and unusual heavy discharge, the disease develops in the
tissues of the cervix, a part connecting the upper body of the
uterus to the vagina.
II. CERVICAL CANCER DISEASE AND ITS SYMPTOMS
In worldwide C2
is the third most common cancer in women
and seventh overall. Majority of this global burden is felt in
low and middle income developing countries (WHO, 2010)
and in low socio-economic groups within countries [1].It
comprises of endocervix or the upper part which is close to the
uterus and covered by glandular cells; and the ecocervix, the
lower part which is close to the vagina and covered by Seamus
cells. The two regions of the cervix meet at the
“transformation zone”. It is this region where most cervical
cancer begins to develop [6].
Figure 1. Cervix Cancer with its Cells (Developing Stage)
A. Signs & Symptoms of Cervical Cancer
Early cervical cancer has no symptoms and it cannot be
identified in a clear way. Symptoms do not begin until the pre-
cancer grows to an aggressive stage and starts to spread to
nearby tissue. When this happens the most common symptoms
are:
Figure 2. Cancer Cells in Progressive Stage
Integrated Intelligent Research (IIR) International Journal of Data Mining Techniques and Applications
Volume: 04 Issue: 02 December 2015 Page No.63-66
ISSN: 2278-2419
64
 Vaginal bleeding abnormally primarily bleeding after sex,
after menopause, spotting between periods, long or heavy
periods than normal, bleeding after pelvic exam.
 Unusual discharge from vagina- the discharge may
contain blood.
 Vaginal intercourse
 These symptoms can be caused other than cervical cancer
too but anyway having a check on it will be good.
III. MEDICAL DATA MINING AND RFT
RFT is best among classification which is able to classify and
predict large amount of data with accuracy[11]. Random
Forest Tree gives an opportunity to set parameter there is no
need to prune tree. Accuracy and variable will be generated
automatically. The outliers are very sensitive in training data.
Different subset of training dataset are selected (2/3) with
replacement to train each tree. Remaining training data (OOB)
are used to estimate error and variable importance class
assignment is made by the number of votes from all of the
trees and for regression the average of the result is used.
Figure 3. Structure of RFT
RFT algorithm was developed by Leo Breiman and
Adelecutler. RFT grows into many classification trees.Each
tree is grown as follow [10].
 If number of training set cases set are N, Sample N are
collected at random but with the replacement from
original data. This sample will be training tree
 M input variables a number MM is specified such that
each node, m variable are selected at random out of M
 Best split on this m is used in split the node. The value of
m is detained continuously during the forest growing
 Every tree is grown to the largest extent possible so there
is no pruning
IV. CLUSTERING K-MEANS ALGORITHM
Whitening is a common preprocess in deep learning work
which provides better performance in respect to the choice of
unsupervised learning algorithm. Preprocessing is a common
practice forperforming many simple normalization steps
before attaining its final stage[4]. We apply a K-Mean
clustering to learn K centroids C(k)
from the input data. Given
the learned centroidsC(k,
we consider two choices for the
feature mapping f,
This is called K-means hard.The second is a non-linear
mapping that attempts to be “softer” than the above encoding
but also yield sparse outputs through simple arithmetic,
Is the mean of the elements of z. This is called K-means
triangle[3].
K-Means Algorithm Unsupervised-Learning-The k-means
algorithm is build upon the following operations
Step 1-Choose initial cluster centers C1, C2, …, CK randomly
from the n points.W1,W2,WN, WI € Rm.
Step 2-Assign point Wi, I=1,2,…,N to cluster Zj= 1,2,…K if
and only if ||wq – cq|| , P=1,2,…K and J ≠ P Ties are resolved
arbitrarily.
Step 3-Compute the new cluster centers
C1
*, C
2
*,
…,Ck
*
as following
C1
*
= (I/n) Wj I = 1,2,…K
Wj ZJ
Step 4-If Ci
*
= Ci, I = 1,2,…K then terminate otherwise CiCj
*
and go to step 2.
Data mining provides the methodology and technology to
analysis the useful information of data[3]. Clustering is a way
that classifies the raw data reasonably and searches the hidden
patterns that may exist in datasets. [7] K-mean is a greedy
algorithm; can only converge to a local minimum clustering
algorithm is divided into two categories. Partition and
hierarchical clustering algorithm K-means partitioned one [2]
Proposed by the queen in 1967. K-means is numerical
unsupervised non-deterministic interactive method [5].
Figure 4. Process of K-means Learning
This table 1 gives the result of Logistic analysis for regression.
In total 11 risk factors are included, the main factors which
gives up to fatal rise of problems like Illiteracy, Married Life,
Integrated Intelligent Research (IIR) International Journal of Data Mining Techniques and Applications
Volume: 04 Issue: 02 December 2015 Page No.63-66
ISSN: 2278-2419
65
Early Menarche, Multiple Pair and Poor Hygiene. Using the
linear transform the statistical weight for the risk factors is
identified. The statistical weight analysis as followed with
figure 6 of statistical weight of risk factor. Table 2 shows the
classification of data subjects which thethe identification of
cases and controls. The total risk category is used on a score of
7 with the interval limit. The classification based on control is
as follow
V. C2
P MODEL – DATA SET AND RISK GROUP DETECTION
Risk factors can be classified in to four categories. Category I
contains white discharge, hip-pain, smelly vaginal discharge,
early marriage, sex at early age and multiple sexual partners.
Category II includes risk factors likely malnutrition, male
sexual behaviors, husband’s food habit and living condition.
Category III – (psychological factors) Category IV – (age,
family history)[9].
Architecure of C2
P model
VI. PREVENTING CANCER COUNT ON
The cervical cancer does not have any symptoms until it
reaches an aggressive stage. So a minor problem should be
taken into account like
 Unbalanced vaginal Bleeding
 Vaginal Discharge with Odor
 Vaginal Discharge in a Watery form
 Vaginal Discharge along with blood
A. Preventive steps
 General education for women population about life style
factors.
 Risk factor screening in general
 Collecting information like age, family history,
menopause period & husband’s activities.
 Collecting results of laboratory texts including Pap smear
test and biopsy test.
 Collecting direct investigation reports like VILLA etc.
 Giving counsel to go the nearest hospital for the screening
regularly C2 dataset is taken from open source data
online. The data are then clustered with k-means
algorithm using MATLAB.
TABLE I. RESULT OF NUMEROUS LOGISTIC REGRESSION ANALYSIS AND STATISTICAL WEIGHTINESS OF THE SIGNIFICANT RISK FACTOR
Risk Factor Regression Co-Efficient Odds Ratio 95% of RC for OR Statistical Weight
Literacy 1.2345 3.527 2.207 -5.556 13
Married Life 0.3385 1.699 1.439 -3.494 7
Early Menarche 0.9795 1.402 0.889-2.212 5
Marital Status 0.3327 2.663 0.708-1.873 2
Genetal Infection 0.1410 2.145 0.453-1.190 5
Multiple Pair 0.0675 1.151 1.354-3.398 7
Abortion 0.0448 1.264 0.660-1.732 2
Tobacco Use 0.637 0.734 1.693-4.190 6
Poor hygiene 0.5299 1.499 0.708-1.873 10
Figure 5. Statistical Weight of Risk Factor
Figure 6. Score Control Analysis for Case
VII. RESULT
Score Cases (n=230) Controls (n=230)
0-7 20 76
8-14 50 42
15-21 20 34
22-28 50 40
28-35 56 20
36-42 61 18
Integrated Intelligent Research (IIR) International Journal of Data Mining Techniques and Applications
Volume: 04 Issue: 02 December 2015 Page No.63-66
ISSN: 2278-2419
66
With RFT- data mining algorithm we achieved 93.37%
accuracy. But, with enhancement, i.e., with k-means learning
we achieved 97.37% result in our research work.
TABLE II. DATA MINING ACCURACY
SL.NO DATA-MINING
ALGORITHM
ACCURACY
1 RFT 93.54
2 RFT WITH K-MEANS
LEARNING
96.77
Figure 7. Result of accuracy using RFT
Figure 8. Architecure of C2
P model
VIII. CONCLUSION
Cervical cancer ranks as the first most frequent cancer among
women in India. Tree predictors make a combination where
every tree depends on the random vector sampled value
independently with the same distribution for all the trees is
said to be Random Forest. It is a perfect tool for building
guesses considering overfit if there or not so to avoid large
numbers. In this work, K-means learning was introduced to
achieve better accuracy.Experimental results conducted on the
collected cervical cancer data set demonstrated the
effectiveness of the proposed techniques. Particularly, the
proposed method mainly aims to predict the cancer cervix
which is usually done by data mining algorithm. This paper is
expected to be of benefit for medical decision making systems
to give an alternative choice for medical practitioners to
construct more accurate predictive models and stronger
classifiers.We are bound to thank the Doctors of the JIPMER
Hospital, Puducherry for their information. It was very useful
and the information’s provided by them was clear and made us
to create a vivid picturization of our complications.
REFERENCES
[1] Rao, Y. N., Sudhir Gupta, and S. P. Agarwal. "National Cancer Control
Programme: Current Status & Strategies." Fifty Years of Cancer Control
In India. Dir Gen of Health Services, MOHFW, Government of India
(2002): 41-7.
[2] Thangavel, Kuttiannan, P. Palanichamy Jaganathan, and P. O. Easmi.
"Data mining approach to cervical cancer patients analysis using
clustering technique." Asian Journal of Information Technology 5.4
(2006): 413-417.
[3] Agrawal, Rakesh, Tomasz Imieliński, and Arun Swami. "Mining
association rules between sets of items in large databases." ACM
SIGMOD Record. Vol. 22. No. 2. ACM, 1993.
[4] Selim, Shokri Z., and Mohamed A. Ismail. "K-means-type algorithms: a
generalized convergence theorem and characterization of local
optimality." Pattern Analysis and Machine Intelligence, IEEE
Transactions on 1 (1984): 81-87.
[5] Sun, Geng, et al. "Research on K-means Clustering Algorithm." Journal
of Changchun Normal University 2 (2011): 001.
[6] Saslow, Debbie, et al. "American Cancer Society guideline for the early
detection of cervical neoplasia and cancer." CA: a cancer journal for
clinicians 52.6 (2002): 342-362.
[7] Ananthanarayana, V. S., M. Narasimha Murty, and D. K. Subramanian.
"Efficient clustering of large data sets." Pattern Recognition 34.12
(2001): 2561-2563.
[8] Petignat, Patrick, and Michel Roy. "Diagnosis and management of
cervical cancer." BMJ: British Medical Journal 335.7623 (2007): 765.
[9] American Cancer Society (accessed 9/5/07) some information taken
from www.cancer.org
[10] Ali, Jehad, et al. "Random forests and decision trees." International
Journal of Computer Science Issues (IJCSI) 9.5 (2012).
[11] Random Forest Algorithm

More Related Content

Similar to A Pioneering Cervical Cancer Prediction Prototype in Medical Data Mining using Clustering Pattern

Similar to A Pioneering Cervical Cancer Prediction Prototype in Medical Data Mining using Clustering Pattern (20)

BREAST TUMOR DETECTION USING EFFICIENT MACHINE LEARNING AND DEEP LEARNING TEC...
BREAST TUMOR DETECTION USING EFFICIENT MACHINE LEARNING AND DEEP LEARNING TEC...BREAST TUMOR DETECTION USING EFFICIENT MACHINE LEARNING AND DEEP LEARNING TEC...
BREAST TUMOR DETECTION USING EFFICIENT MACHINE LEARNING AND DEEP LEARNING TEC...
 
Breast Tumor Detection Using Efficient Machine Learning and Deep Learning Tec...
Breast Tumor Detection Using Efficient Machine Learning and Deep Learning Tec...Breast Tumor Detection Using Efficient Machine Learning and Deep Learning Tec...
Breast Tumor Detection Using Efficient Machine Learning and Deep Learning Tec...
 
IRJET - Cervical Cancer Prognosis using MARS and Classification
IRJET - Cervical Cancer Prognosis using MARS and ClassificationIRJET - Cervical Cancer Prognosis using MARS and Classification
IRJET - Cervical Cancer Prognosis using MARS and Classification
 
Classification AlgorithmBased Analysis of Breast Cancer Data
Classification AlgorithmBased Analysis of Breast Cancer DataClassification AlgorithmBased Analysis of Breast Cancer Data
Classification AlgorithmBased Analysis of Breast Cancer Data
 
A Novel DBSCAN Approach to Identify Microcalcifications in Cancer Images with...
A Novel DBSCAN Approach to Identify Microcalcifications in Cancer Images with...A Novel DBSCAN Approach to Identify Microcalcifications in Cancer Images with...
A Novel DBSCAN Approach to Identify Microcalcifications in Cancer Images with...
 
Breast Cancer Detection Using Machine Learning
Breast Cancer Detection Using Machine LearningBreast Cancer Detection Using Machine Learning
Breast Cancer Detection Using Machine Learning
 
IRJET - Classification of Cancer Images using Deep Learning
IRJET -  	  Classification of Cancer Images using Deep LearningIRJET -  	  Classification of Cancer Images using Deep Learning
IRJET - Classification of Cancer Images using Deep Learning
 
USING DATA MINING TECHNIQUES FOR DIAGNOSIS AND PROGNOSIS OF CANCER DISEASE
USING DATA MINING TECHNIQUES FOR DIAGNOSIS AND PROGNOSIS OF CANCER DISEASEUSING DATA MINING TECHNIQUES FOR DIAGNOSIS AND PROGNOSIS OF CANCER DISEASE
USING DATA MINING TECHNIQUES FOR DIAGNOSIS AND PROGNOSIS OF CANCER DISEASE
 
Logistic Regression Model for Predicting the Malignancy of Breast Cancer
Logistic Regression Model for Predicting the Malignancy of Breast CancerLogistic Regression Model for Predicting the Malignancy of Breast Cancer
Logistic Regression Model for Predicting the Malignancy of Breast Cancer
 
Image processing and machine learning techniques used in computer-aided dete...
Image processing and machine learning techniques  used in computer-aided dete...Image processing and machine learning techniques  used in computer-aided dete...
Image processing and machine learning techniques used in computer-aided dete...
 
An approach of cervical cancer diagnosis using class weighting and oversampli...
An approach of cervical cancer diagnosis using class weighting and oversampli...An approach of cervical cancer diagnosis using class weighting and oversampli...
An approach of cervical cancer diagnosis using class weighting and oversampli...
 
IRJET- Comparison of Breast Cancer Detection using Probabilistic Neural Netwo...
IRJET- Comparison of Breast Cancer Detection using Probabilistic Neural Netwo...IRJET- Comparison of Breast Cancer Detection using Probabilistic Neural Netwo...
IRJET- Comparison of Breast Cancer Detection using Probabilistic Neural Netwo...
 
Breast Cancer Stage Classification on Digital Mammogram Images
Breast Cancer Stage Classification on Digital Mammogram ImagesBreast Cancer Stage Classification on Digital Mammogram Images
Breast Cancer Stage Classification on Digital Mammogram Images
 
PREDICTION OF BREAST CANCER,COMPARATIVE REVIEW OF MACHINE LEARNING TECHNIQUES...
PREDICTION OF BREAST CANCER,COMPARATIVE REVIEW OF MACHINE LEARNING TECHNIQUES...PREDICTION OF BREAST CANCER,COMPARATIVE REVIEW OF MACHINE LEARNING TECHNIQUES...
PREDICTION OF BREAST CANCER,COMPARATIVE REVIEW OF MACHINE LEARNING TECHNIQUES...
 
Breast cancer detection using machine learning approaches: a comparative study
Breast cancer detection using machine learning approaches: a  comparative studyBreast cancer detection using machine learning approaches: a  comparative study
Breast cancer detection using machine learning approaches: a comparative study
 
Cervical Cancer Analysis
Cervical Cancer AnalysisCervical Cancer Analysis
Cervical Cancer Analysis
 
Lung Cancer Detection Using Deep Learning Algorithms
Lung Cancer Detection Using Deep Learning AlgorithmsLung Cancer Detection Using Deep Learning Algorithms
Lung Cancer Detection Using Deep Learning Algorithms
 
IRJET - Breast Cancer Prediction using Supervised Machine Learning Algorithms...
IRJET - Breast Cancer Prediction using Supervised Machine Learning Algorithms...IRJET - Breast Cancer Prediction using Supervised Machine Learning Algorithms...
IRJET - Breast Cancer Prediction using Supervised Machine Learning Algorithms...
 
IRJET - Classifying Breast Cancer Tumour Type using Convolution Neural Netwo...
IRJET  - Classifying Breast Cancer Tumour Type using Convolution Neural Netwo...IRJET  - Classifying Breast Cancer Tumour Type using Convolution Neural Netwo...
IRJET - Classifying Breast Cancer Tumour Type using Convolution Neural Netwo...
 
IRJET- Breast Cancer Prediction using Supervised Machine Learning Algorithms
IRJET- Breast Cancer Prediction using Supervised Machine Learning AlgorithmsIRJET- Breast Cancer Prediction using Supervised Machine Learning Algorithms
IRJET- Breast Cancer Prediction using Supervised Machine Learning Algorithms
 

More from IIRindia

Silhouette Threshold Based Text Clustering for Log Analysis
Silhouette Threshold Based Text Clustering for Log AnalysisSilhouette Threshold Based Text Clustering for Log Analysis
Silhouette Threshold Based Text Clustering for Log Analysis
IIRindia
 
Analysis and Representation of Igbo Text Document for a Text-Based System
Analysis and Representation of Igbo Text Document for a Text-Based SystemAnalysis and Representation of Igbo Text Document for a Text-Based System
Analysis and Representation of Igbo Text Document for a Text-Based System
IIRindia
 
A Survey on E-Learning System with Data Mining
A Survey on E-Learning System with Data MiningA Survey on E-Learning System with Data Mining
A Survey on E-Learning System with Data Mining
IIRindia
 
V5_I2_2016_Paper11.docx
V5_I2_2016_Paper11.docxV5_I2_2016_Paper11.docx
V5_I2_2016_Paper11.docx
IIRindia
 
Performance Evaluation of Feature Selection Algorithms in Educational Data Mi...
Performance Evaluation of Feature Selection Algorithms in Educational Data Mi...Performance Evaluation of Feature Selection Algorithms in Educational Data Mi...
Performance Evaluation of Feature Selection Algorithms in Educational Data Mi...
IIRindia
 
Comparative Analysis of Weighted Emphirical Optimization Algorithm and Lazy C...
Comparative Analysis of Weighted Emphirical Optimization Algorithm and Lazy C...Comparative Analysis of Weighted Emphirical Optimization Algorithm and Lazy C...
Comparative Analysis of Weighted Emphirical Optimization Algorithm and Lazy C...
IIRindia
 

More from IIRindia (20)

An Investigation into Brain Tumor Segmentation Techniques
An Investigation into Brain Tumor Segmentation Techniques An Investigation into Brain Tumor Segmentation Techniques
An Investigation into Brain Tumor Segmentation Techniques
 
E-Agriculture - A Way to Digitalization
E-Agriculture - A Way to DigitalizationE-Agriculture - A Way to Digitalization
E-Agriculture - A Way to Digitalization
 
A Survey on the Analysis of Dissolved Oxygen Level in Water using Data Mining...
A Survey on the Analysis of Dissolved Oxygen Level in Water using Data Mining...A Survey on the Analysis of Dissolved Oxygen Level in Water using Data Mining...
A Survey on the Analysis of Dissolved Oxygen Level in Water using Data Mining...
 
Kidney Failure Due to Diabetics – Detection using Classification Algorithm in...
Kidney Failure Due to Diabetics – Detection using Classification Algorithm in...Kidney Failure Due to Diabetics – Detection using Classification Algorithm in...
Kidney Failure Due to Diabetics – Detection using Classification Algorithm in...
 
Silhouette Threshold Based Text Clustering for Log Analysis
Silhouette Threshold Based Text Clustering for Log AnalysisSilhouette Threshold Based Text Clustering for Log Analysis
Silhouette Threshold Based Text Clustering for Log Analysis
 
Analysis and Representation of Igbo Text Document for a Text-Based System
Analysis and Representation of Igbo Text Document for a Text-Based SystemAnalysis and Representation of Igbo Text Document for a Text-Based System
Analysis and Representation of Igbo Text Document for a Text-Based System
 
A Survey on E-Learning System with Data Mining
A Survey on E-Learning System with Data MiningA Survey on E-Learning System with Data Mining
A Survey on E-Learning System with Data Mining
 
Image Segmentation Based Survey on the Lung Cancer MRI Images
Image Segmentation Based Survey on the Lung Cancer MRI ImagesImage Segmentation Based Survey on the Lung Cancer MRI Images
Image Segmentation Based Survey on the Lung Cancer MRI Images
 
The Preface Layer for Auditing Sensual Interacts of Primary Distress Conceali...
The Preface Layer for Auditing Sensual Interacts of Primary Distress Conceali...The Preface Layer for Auditing Sensual Interacts of Primary Distress Conceali...
The Preface Layer for Auditing Sensual Interacts of Primary Distress Conceali...
 
Feature Based Underwater Fish Recognition Using SVM Classifier
Feature Based Underwater Fish Recognition Using SVM ClassifierFeature Based Underwater Fish Recognition Using SVM Classifier
Feature Based Underwater Fish Recognition Using SVM Classifier
 
A Survey on Educational Data Mining Techniques
A Survey on Educational Data Mining TechniquesA Survey on Educational Data Mining Techniques
A Survey on Educational Data Mining Techniques
 
V5_I2_2016_Paper11.docx
V5_I2_2016_Paper11.docxV5_I2_2016_Paper11.docx
V5_I2_2016_Paper11.docx
 
A Study on MRI Liver Image Segmentation using Fuzzy Connected and Watershed T...
A Study on MRI Liver Image Segmentation using Fuzzy Connected and Watershed T...A Study on MRI Liver Image Segmentation using Fuzzy Connected and Watershed T...
A Study on MRI Liver Image Segmentation using Fuzzy Connected and Watershed T...
 
A Clustering Based Collaborative and Pattern based Filtering approach for Big...
A Clustering Based Collaborative and Pattern based Filtering approach for Big...A Clustering Based Collaborative and Pattern based Filtering approach for Big...
A Clustering Based Collaborative and Pattern based Filtering approach for Big...
 
Hadoop and Hive Inspecting Maintenance of Mobile Application for Groceries Ex...
Hadoop and Hive Inspecting Maintenance of Mobile Application for Groceries Ex...Hadoop and Hive Inspecting Maintenance of Mobile Application for Groceries Ex...
Hadoop and Hive Inspecting Maintenance of Mobile Application for Groceries Ex...
 
Performance Evaluation of Feature Selection Algorithms in Educational Data Mi...
Performance Evaluation of Feature Selection Algorithms in Educational Data Mi...Performance Evaluation of Feature Selection Algorithms in Educational Data Mi...
Performance Evaluation of Feature Selection Algorithms in Educational Data Mi...
 
A Review of Edge Detection Techniques for Image Segmentation
A Review of Edge Detection Techniques for Image SegmentationA Review of Edge Detection Techniques for Image Segmentation
A Review of Edge Detection Techniques for Image Segmentation
 
Leanness Assessment using Fuzzy Logic Approach: A Case of Indian Horn Manufac...
Leanness Assessment using Fuzzy Logic Approach: A Case of Indian Horn Manufac...Leanness Assessment using Fuzzy Logic Approach: A Case of Indian Horn Manufac...
Leanness Assessment using Fuzzy Logic Approach: A Case of Indian Horn Manufac...
 
Comparative Analysis of Weighted Emphirical Optimization Algorithm and Lazy C...
Comparative Analysis of Weighted Emphirical Optimization Algorithm and Lazy C...Comparative Analysis of Weighted Emphirical Optimization Algorithm and Lazy C...
Comparative Analysis of Weighted Emphirical Optimization Algorithm and Lazy C...
 
Survey on Segmentation Techniques for Spinal Cord Images
Survey on Segmentation Techniques for Spinal Cord ImagesSurvey on Segmentation Techniques for Spinal Cord Images
Survey on Segmentation Techniques for Spinal Cord Images
 

Recently uploaded

Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Christo Ananth
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
Tonystark477637
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Dr.Costas Sachpazis
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
ankushspencer015
 

Recently uploaded (20)

Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICSUNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...
 
NFPA 5000 2024 standard .
NFPA 5000 2024 standard                                  .NFPA 5000 2024 standard                                  .
NFPA 5000 2024 standard .
 
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLPVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 

A Pioneering Cervical Cancer Prediction Prototype in Medical Data Mining using Clustering Pattern

  • 1. Integrated Intelligent Research (IIR) International Journal of Data Mining Techniques and Applications Volume: 04 Issue: 02 December 2015 Page No.63-66 ISSN: 2278-2419 63 A Pioneering Cervical Cancer Prediction Prototype in Medical Data Mining using Clustering Pattern R.Vidya1 ,G.M.Nasira2 1 Research scholar of M.S.University,Tirunelveli, Assistant Professor /Department of Computer Science St.Joseph’s college, Cuddalore, India 2 Assistant Professor/Department of Computer Science, ChikkannaGovernment College for Women, Tirupur, India Email: vidya.sjc@gmail.com, nasiragm99@yahoo.com Abstract—Let us not make the cure of the disease more unbearable than the disease itself this quote is the most durable and inspirational line of medicine field. Data mining is said to be an umbrella term which refers to the progression of finding out the patterns in data. This can be even succeeded typically with an assistance of authoritative algorithm to automate search (as a part). This paper reveals out, how the C2P (Cervical Cancer Prediction) model is approached by a data mining algorithm for prediction. The prediction of C2 (Cervical Cancer) has been a challenging problem in research field. In the Data mining applications, we are utilizing RFT (Random Forest Tree) algorithm to do the prediction. To the best of our knowledge, we use popular clustering K-means technique to achieve more accuracy. Keywords— Cervical Cancer, Random Forest Tree, K-means learning, Data mining, Clustering K-means. I. INTRODUCTION Cancer of the cervix is one of the most prevalent forms of cancer worldwide, the major burden of the disease being felt in developing countries like India. Cervical carcinoma still continues to be the most common cancer among women and accounts for the maximum cancer deaths each year. Persistent infections with High-Risk (HR) Human Papilloma viruses, such as HPV 16,18,31,33 and 45 have been identified as a major development of the distance [8]. This model interpretability and prediction accuracy provided by Random Forest is very unique among popular machine learning methods. Accurate predictions and better generalizations are achieved due to utilization of ensemble strategies and random sampling.Random Forest three main features that gained focus are: (i) Accuratepredictions results for a variety of applications. (ii) Through model training, the importance of each feature can be measured (iii) Trained model can measure the pair-wise proximity between the samples. The research of our paper is as followed by cervical cancer and diseases and then proceeded by medical data mining and RFT where we deal with K-Mean algorithm in the next phase. Followed by this we have given out our C2 P Model along with how to prevent cancer. Later the paper concludes by Result and conclusion. Characterized by abnormal bleeding, pelvic pain and unusual heavy discharge, the disease develops in the tissues of the cervix, a part connecting the upper body of the uterus to the vagina. II. CERVICAL CANCER DISEASE AND ITS SYMPTOMS In worldwide C2 is the third most common cancer in women and seventh overall. Majority of this global burden is felt in low and middle income developing countries (WHO, 2010) and in low socio-economic groups within countries [1].It comprises of endocervix or the upper part which is close to the uterus and covered by glandular cells; and the ecocervix, the lower part which is close to the vagina and covered by Seamus cells. The two regions of the cervix meet at the “transformation zone”. It is this region where most cervical cancer begins to develop [6]. Figure 1. Cervix Cancer with its Cells (Developing Stage) A. Signs & Symptoms of Cervical Cancer Early cervical cancer has no symptoms and it cannot be identified in a clear way. Symptoms do not begin until the pre- cancer grows to an aggressive stage and starts to spread to nearby tissue. When this happens the most common symptoms are: Figure 2. Cancer Cells in Progressive Stage
  • 2. Integrated Intelligent Research (IIR) International Journal of Data Mining Techniques and Applications Volume: 04 Issue: 02 December 2015 Page No.63-66 ISSN: 2278-2419 64  Vaginal bleeding abnormally primarily bleeding after sex, after menopause, spotting between periods, long or heavy periods than normal, bleeding after pelvic exam.  Unusual discharge from vagina- the discharge may contain blood.  Vaginal intercourse  These symptoms can be caused other than cervical cancer too but anyway having a check on it will be good. III. MEDICAL DATA MINING AND RFT RFT is best among classification which is able to classify and predict large amount of data with accuracy[11]. Random Forest Tree gives an opportunity to set parameter there is no need to prune tree. Accuracy and variable will be generated automatically. The outliers are very sensitive in training data. Different subset of training dataset are selected (2/3) with replacement to train each tree. Remaining training data (OOB) are used to estimate error and variable importance class assignment is made by the number of votes from all of the trees and for regression the average of the result is used. Figure 3. Structure of RFT RFT algorithm was developed by Leo Breiman and Adelecutler. RFT grows into many classification trees.Each tree is grown as follow [10].  If number of training set cases set are N, Sample N are collected at random but with the replacement from original data. This sample will be training tree  M input variables a number MM is specified such that each node, m variable are selected at random out of M  Best split on this m is used in split the node. The value of m is detained continuously during the forest growing  Every tree is grown to the largest extent possible so there is no pruning IV. CLUSTERING K-MEANS ALGORITHM Whitening is a common preprocess in deep learning work which provides better performance in respect to the choice of unsupervised learning algorithm. Preprocessing is a common practice forperforming many simple normalization steps before attaining its final stage[4]. We apply a K-Mean clustering to learn K centroids C(k) from the input data. Given the learned centroidsC(k, we consider two choices for the feature mapping f, This is called K-means hard.The second is a non-linear mapping that attempts to be “softer” than the above encoding but also yield sparse outputs through simple arithmetic, Is the mean of the elements of z. This is called K-means triangle[3]. K-Means Algorithm Unsupervised-Learning-The k-means algorithm is build upon the following operations Step 1-Choose initial cluster centers C1, C2, …, CK randomly from the n points.W1,W2,WN, WI € Rm. Step 2-Assign point Wi, I=1,2,…,N to cluster Zj= 1,2,…K if and only if ||wq – cq|| , P=1,2,…K and J ≠ P Ties are resolved arbitrarily. Step 3-Compute the new cluster centers C1 *, C 2 *, …,Ck * as following C1 * = (I/n) Wj I = 1,2,…K Wj ZJ Step 4-If Ci * = Ci, I = 1,2,…K then terminate otherwise CiCj * and go to step 2. Data mining provides the methodology and technology to analysis the useful information of data[3]. Clustering is a way that classifies the raw data reasonably and searches the hidden patterns that may exist in datasets. [7] K-mean is a greedy algorithm; can only converge to a local minimum clustering algorithm is divided into two categories. Partition and hierarchical clustering algorithm K-means partitioned one [2] Proposed by the queen in 1967. K-means is numerical unsupervised non-deterministic interactive method [5]. Figure 4. Process of K-means Learning This table 1 gives the result of Logistic analysis for regression. In total 11 risk factors are included, the main factors which gives up to fatal rise of problems like Illiteracy, Married Life,
  • 3. Integrated Intelligent Research (IIR) International Journal of Data Mining Techniques and Applications Volume: 04 Issue: 02 December 2015 Page No.63-66 ISSN: 2278-2419 65 Early Menarche, Multiple Pair and Poor Hygiene. Using the linear transform the statistical weight for the risk factors is identified. The statistical weight analysis as followed with figure 6 of statistical weight of risk factor. Table 2 shows the classification of data subjects which thethe identification of cases and controls. The total risk category is used on a score of 7 with the interval limit. The classification based on control is as follow V. C2 P MODEL – DATA SET AND RISK GROUP DETECTION Risk factors can be classified in to four categories. Category I contains white discharge, hip-pain, smelly vaginal discharge, early marriage, sex at early age and multiple sexual partners. Category II includes risk factors likely malnutrition, male sexual behaviors, husband’s food habit and living condition. Category III – (psychological factors) Category IV – (age, family history)[9]. Architecure of C2 P model VI. PREVENTING CANCER COUNT ON The cervical cancer does not have any symptoms until it reaches an aggressive stage. So a minor problem should be taken into account like  Unbalanced vaginal Bleeding  Vaginal Discharge with Odor  Vaginal Discharge in a Watery form  Vaginal Discharge along with blood A. Preventive steps  General education for women population about life style factors.  Risk factor screening in general  Collecting information like age, family history, menopause period & husband’s activities.  Collecting results of laboratory texts including Pap smear test and biopsy test.  Collecting direct investigation reports like VILLA etc.  Giving counsel to go the nearest hospital for the screening regularly C2 dataset is taken from open source data online. The data are then clustered with k-means algorithm using MATLAB. TABLE I. RESULT OF NUMEROUS LOGISTIC REGRESSION ANALYSIS AND STATISTICAL WEIGHTINESS OF THE SIGNIFICANT RISK FACTOR Risk Factor Regression Co-Efficient Odds Ratio 95% of RC for OR Statistical Weight Literacy 1.2345 3.527 2.207 -5.556 13 Married Life 0.3385 1.699 1.439 -3.494 7 Early Menarche 0.9795 1.402 0.889-2.212 5 Marital Status 0.3327 2.663 0.708-1.873 2 Genetal Infection 0.1410 2.145 0.453-1.190 5 Multiple Pair 0.0675 1.151 1.354-3.398 7 Abortion 0.0448 1.264 0.660-1.732 2 Tobacco Use 0.637 0.734 1.693-4.190 6 Poor hygiene 0.5299 1.499 0.708-1.873 10 Figure 5. Statistical Weight of Risk Factor Figure 6. Score Control Analysis for Case VII. RESULT Score Cases (n=230) Controls (n=230) 0-7 20 76 8-14 50 42 15-21 20 34 22-28 50 40 28-35 56 20 36-42 61 18
  • 4. Integrated Intelligent Research (IIR) International Journal of Data Mining Techniques and Applications Volume: 04 Issue: 02 December 2015 Page No.63-66 ISSN: 2278-2419 66 With RFT- data mining algorithm we achieved 93.37% accuracy. But, with enhancement, i.e., with k-means learning we achieved 97.37% result in our research work. TABLE II. DATA MINING ACCURACY SL.NO DATA-MINING ALGORITHM ACCURACY 1 RFT 93.54 2 RFT WITH K-MEANS LEARNING 96.77 Figure 7. Result of accuracy using RFT Figure 8. Architecure of C2 P model VIII. CONCLUSION Cervical cancer ranks as the first most frequent cancer among women in India. Tree predictors make a combination where every tree depends on the random vector sampled value independently with the same distribution for all the trees is said to be Random Forest. It is a perfect tool for building guesses considering overfit if there or not so to avoid large numbers. In this work, K-means learning was introduced to achieve better accuracy.Experimental results conducted on the collected cervical cancer data set demonstrated the effectiveness of the proposed techniques. Particularly, the proposed method mainly aims to predict the cancer cervix which is usually done by data mining algorithm. This paper is expected to be of benefit for medical decision making systems to give an alternative choice for medical practitioners to construct more accurate predictive models and stronger classifiers.We are bound to thank the Doctors of the JIPMER Hospital, Puducherry for their information. It was very useful and the information’s provided by them was clear and made us to create a vivid picturization of our complications. REFERENCES [1] Rao, Y. N., Sudhir Gupta, and S. P. Agarwal. "National Cancer Control Programme: Current Status & Strategies." Fifty Years of Cancer Control In India. Dir Gen of Health Services, MOHFW, Government of India (2002): 41-7. [2] Thangavel, Kuttiannan, P. Palanichamy Jaganathan, and P. O. Easmi. "Data mining approach to cervical cancer patients analysis using clustering technique." Asian Journal of Information Technology 5.4 (2006): 413-417. [3] Agrawal, Rakesh, Tomasz Imieliński, and Arun Swami. "Mining association rules between sets of items in large databases." ACM SIGMOD Record. Vol. 22. No. 2. ACM, 1993. [4] Selim, Shokri Z., and Mohamed A. Ismail. "K-means-type algorithms: a generalized convergence theorem and characterization of local optimality." Pattern Analysis and Machine Intelligence, IEEE Transactions on 1 (1984): 81-87. [5] Sun, Geng, et al. "Research on K-means Clustering Algorithm." Journal of Changchun Normal University 2 (2011): 001. [6] Saslow, Debbie, et al. "American Cancer Society guideline for the early detection of cervical neoplasia and cancer." CA: a cancer journal for clinicians 52.6 (2002): 342-362. [7] Ananthanarayana, V. S., M. Narasimha Murty, and D. K. Subramanian. "Efficient clustering of large data sets." Pattern Recognition 34.12 (2001): 2561-2563. [8] Petignat, Patrick, and Michel Roy. "Diagnosis and management of cervical cancer." BMJ: British Medical Journal 335.7623 (2007): 765. [9] American Cancer Society (accessed 9/5/07) some information taken from www.cancer.org [10] Ali, Jehad, et al. "Random forests and decision trees." International Journal of Computer Science Issues (IJCSI) 9.5 (2012). [11] Random Forest Algorithm