SlideShare a Scribd company logo
1 of 4
Download to read offline
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 04 | Apr 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 4770
EXPLORING COLORECTAL CANCER GENES THROUGH DATA MINING
TECHNIQUES
R. DEVENTHIRAN1, S. SENTHILKUMAR2
1 PG Scholar, Department of MCA, Arulmigu Meenakshi Amman College of Engineering,
AnnaUniversity, Vadamavandal (near Kanchipuram), India.
2 Assistant Professor, Department of MCA, Arulmigu Meenakshi Amman College of Engineering, Anna University,
Vadamavandal (near Kanchipuram), India.
-------------------------------------------------------------------------***------------------------------------------------------------------------
ABSTRACT: - Data mining is used in various medical applications like tumor classification, protein structure prediction,
gene classification, cancer classification based on microarray data, clustering of gene expression data, statistical model of
protein-protein interaction etc. Drug events in prediction of medical test effectiveness can be done based on genomics and
proteomics through data mining approaches. Cancer detection is one of the hot research topics in the bioinformatics.
Classification of colon cancer dataset using WEKA 3.6, in which Logistics, IBK, KSTAR, NNGE, AD Tree, Random Forest
Algorithms show 100 % correctly classified instances, followed by Navie bayes and PART with 97.22 %, Simple Cart and
Zero R has shown the least with 50 % of correctly classified instances. Mean absolute error and Root mean squared error
are shown low for Logistics, KSTAR and NNGE.
1 INTRODUCTION:
Accumulation of data all over the globe is growing at a
fast rate especially in the health field. One of the main
interesting fields is cancer in general. In UAE, for example,
there are 35% of residences affected by the colon cancer.
Generally, the Colon Cancer often occurs with the Rectal
Cancer and called Colorectal Cancer (CRC). All people over
68 are more disposed to have the Colorectal Cancer, and
the percent of Colon and Rectum Cancer deaths is highest
among people aged 75-84.
Colon and Rectum Cancer represent 8% of all new
Cancer cases where there will be an estimated 50,000 new
cases of Colon Cancer in the United States in 2017. In UAE,
the Colorectal Cancer is considered as the second largest
cause of death from cancer after breast cancer.
1.1 EXISTING SYSTEM:
Various types of medical cancer research are in top
priority around the globe. Data mining can help clear out
unseen cancer treatment for different kinds of populations,
ethnicities, genders, economic and social influences that
may affect assessments worked on Extreme Learning
Machine (ELM) for classification.
EXISTING TECHNIQUES:
Zero R algorithm.
TECHNIQUES DEFINITION:
Zero Rule (Zero R) algorithm is the simplest
classification method which relies on the target and ignores
all predictors. It can be used to determine a baseline
performance as a benchmark for other classification
methods.
DRAWBACKS:
There is nothing to be said about the predictors
contribution to the model because Zero R does not use any
of them.
1.2 PROPOSED SYSTEM:
Bayesian networks are a powerful probabilistic
representation, and their predictor or attributes. In
particular, the naive Bayes classifier is a Bayesian network
where the class has no parents and each attribute has the
class as its sole parent. The Naive Bayesian classifier is
based on Bayes’ theorem with independence assumptions
between predictors.
A Naive Bayesian model is easy to build, with no
complicated iterative parameter estimation which makes it
particularly useful for very large datasets.
PROPOSED TECHNIQUE:
Naïve Bayes Multinomial algorithm.
TECHNIQUE DEFINITION:
This method goes by the name of Naïve Bayes, because
it’s based on Bayes’ rule and “naïvely” assumes
independence; it is only valid to multiply probabilities
when the events are independent.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 04 | Apr 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 4771
ADVANTAGES:
Make the rule assign that class to this value of the
predictors
Calculate the total error of the rules of each predictor.
2. SYSTEMS SPECIFICATION:
2.1 HARDWARE REQUIREMENTS:
PROCESSOR : DUAL CORE
RAM : 4GB
HARD DISK : 250 GB
2.2 SOFTWARE REQUIREMENTS:-
FRONT END: J2EE (JSP, SERVLET), STRUCTS
BACK END: ARFF DATASETS
OPERATING SYSTEM: WINDOWS7
IDE : ECLIPSE.
3. PROJECT DESCRIPTION:
MODULES:
 Data Preprocessing Module
 Data Predictor Module
 Data Analysis Module
 Data Reporting Module
Data Preprocessing Module:
We are first collecting all the data sets from available
sources ,and based on our project we are preprocessed the
data like separating the cancer tissues ,normal tissues
,cancer values ,normal values and their names based on the
signed values .
Data Predictor Module:
In data prediction module, all the data sets like genes,
tissues classification based on signed values, if it is positive
value, the prediction will be normal tissue if it is negative
range that will be cancer, based on that the tissue analysis
happen and the expected genes result and after that we got
the confusion matrix like the overall analysis of the tissue,
genes, elements etc..
Data Analysis Module:
In the data analysis module all the datasets are analyzed
based on 4 Algorithm's and it will give the analyzed output
based on analyze techniques.
Genes
Normal
tissues
Cancer
tissues
Prediction
Confusio
n Matrix
Predicte
d Tissue
type
Expected gene
result
DATA
PREDICATOR
MODULE
MINING ACTUAL
GENE VALUE
FROM
PREDICTION
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 04 | Apr 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 4772
Data Reporting Module:
In the data reporting module all these prediction and
analysis taken into account and our colon detector tool
summarize all the output based on confusion matrix result like
which algorithm gives the best accuracy and medicines for
cancer tissues.
4. SYSTEM DESIGN
4.1 SYSTEM ARCHITECTURE:
5. SCOPE OF THE PROJECT:
The project works for the Cancer detection from gene
expression data. We presented a deep learning technique to
detect cancer and to identify critical genes for the diagnosis
of cancer. Their results showed that the highly interactive
genes could be useful as cancer biomarkers for the
detection of cancer.
By using Zero R algorithm, Naive Bayes algorithm,
Ripper algorithm, and J48 algorithm. We are detecting the
cancer genes and normal By using Zero R algorithm,
Naive Bayes algorithm, We are detecting the cancer
genes and normal genes from the output of each algorithm,
the accuracy level gets vary.
Colorectal Cancer qualities in a Nutshell:
By and large, Cancer is a crazy cell development illness.
Colon Cancer and Rectal Cancer frequently happen together
and are called Colorectal Cancer (CRC). The Colon is the
most reduced piece of the stomach related framework.
Roughly, individuals more than 68 are increasingly inclined
to have the Colorectal Cancer, and the percent of Colon and
Rectum Cancer passing is most noteworthy among
individuals matured 75-84. Colon and Rectum Cancer
speak to 8% of all new Cancer cases.
DATA MINING TECHNIQUES:
Data mining techniques process the information by
using specific methods to discover this data and extract
vital points from the big data set. Including methods such
as generalization, characterization, classification,
clustering, association, evolution, pattern matching and
data visualization, and not limited to meta-rule guided
Mining.
(i) Zero R Algorithim
Zero Rule (Zero R) algorithm is the simplest classification
method which relies on the target and ignores all predictors. It
can be used to determine a baseline performance as a
benchmark for other classification methods.
(ii) Naïve Bayes Multinomial algorithm
This method goes by the name of Naïve Bayes, because
it’s based on Bayes’ rule and “naïvely” assumes
independence; it is only valid to multiply probabilities
when the events are independent. Bayesian networks are a
powerful probabilistic representation, and their predictor
or attributes.
(iii)Repeated Incremental Pruning to Produce Error
Reduction (RIPPER)
RIPPER is a well-known algorithm that utilizes
separate-and-conquer strategy to distinct between positive
and negative instances. It is considered as the master of the
inductive rule learning that has the ability to deal with
over-fitting problem.
(iv)J48
J48 is a prediction algorithm that can be used
successfully in many demanding problem domains and also
to create Decision Trees. It is like a dataset including two
lists: a list of predictors (independent variables) and a list
of targets (dependent variables) that would allow you to
predict the target variable of a new dataset record. Decision
tree J48 implementation is based on the C4.5 algorithm ID3
(Iterative Dichotomies 3) developed by the WEKA project
team.
ACTUAL GENE
VALUE
PREDICTION
GENE
ANALYSIS
REPORT
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 04 | Apr 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 4773
PROJECT IMPLEMENTATION:
(i) Data Collection
(ii) Classification
(iii) Clustering
(iv) Prediction
The Colon Cancer is a forceful and understood illness
that influences individuals all around the globe. To examine
this sort of malignancy, this examination endeavors to
investigate the relations between the qualities in charge of
colon disease by means of Data Mining arrangement
procedures. WEKA has been utilized as a Data mining
device to investigate a known dataset for Colorectal Cancer
qualities from writing. Distinctive procedures were utilized
and thought about. The outcomes were promising and the
investigation step can be talked about with a specialist
soon.
Accumulation of data all over the globe is growing at a
fast rate especially in the health field. One of the main
interesting fields is cancer in general. The Colon Cancer, in
particular, is an aggressive and well known disease that
affects people all around the world. In UAE, for example,
there are 35% of residences affected by the colon cancer.
Generally, the Colon Cancer often occurs with the Rectal
Cancer and called Colorectal Cancer (CRC).
(i)Data Collection:
In data collection phase different cancer genes collected as
in the form of data sets. The information includes the
normal tissues, cancer tissues etc.
(ii)Classification:
Classification is the most common data mining technique,
which employs a set of pre classified examples to develop a
model that can classify the population of records at large.
This approach frequently employs decision tree or neural
network - based classification algorithms. The data
classification process involves learning and classification.
(iii)Clustering:
Clustering can be said as identification of similar classes of
objects. By using clustering techniques we can further
identify dense and sparse regions in object space and can
discover overall distribution pattern and correlations
among data attributes.
(iv)Prediction:
Predicting the cancer genes based upon four important
algorithms based on previously cancer genes and normal
genes behavior and activities. In this module can predict
which tissues are affected by cancer genes and normal
genes.
6. CONCLUSIONS:
This simple study aims at exploring the Colorectal
Cancer disease using the available data in literature. The
used dataset contains the expression of the 2000 genes
with highest minimal intensity across the 62 tissues with
40 tumor tissues and 22 normal ones. The best algorithm in
terms of classification performance was Naïve Bayes
Multinomial with very close results.
The next step was to filter the 2000 genes into 26 using a
well-known feature selection algorithm and apply a rule-
based classification algorithm on the new dataset. The
generated rules were very clear and adding a new insight
to the database in general. It seems that the gene “M26383”
is a very important gene that can determine a normal tissue
if it is below 56.9. As a general finding, these rules can be
further studied by an expert in the field.
7. FUTURE ENHANCEMENT:
Designing efficient online mechanisms for a bi-directional
market is significantly more challenging. We consider
server energy cost minimization in social welfare
maximization, and reveal an important property, sub
modularity, of the objective function in the resulting
significantly more challenging offline problem.
8. REFERENCES:
[1] (2017, February) Centers for Disease Control and
Prevention.[Online]. HYPERLINK "https://www.cdc.gov/"
https://www.cdc.gov/
[2] National Cancer Institute. [Online]. HYPERLINK
"https://seer.cancer.gov/statfacts/html/colorect.html"
https://seer.cancer.gov/statfacts/html/colorect.html
[3] (2017, February) The Surveillance, Epidemiology, and
End Results.[Online]. HYPERLINK
"https://seer.cancer.gov/" https://seer.cancer.gov/
[4] Centers for Disease Control and Prevention
(CDC).[Online]. HYPERLINK
"https://www.cdc.gov/cancer/colorectal/index.htm"
https://www.cdc.gov/cancer/colorectal/index.htm
[5] U. Alon et al., "Broad patterns of gene expression
revealed by clustering analysis of tumor and normal colon
tissues probed by oligonucleotide arrays," Proceedings of
the National Academy of Sciences, vol. 96, no. 12, pp. 6745-
6750, 1999.
[6] Jesse Samuel Moore and Tess Hannah Aulet, "Colorectal
Cancer Screening," Surgical Clinics of North Ameria, vol. 97,
no. 3, pp. 487-502, 2017.
[7]"https://seer.cancer.gov/statfacts/html/colorect.html"
[8]https://seer.cancer.gov/statfacts/html/colorect.html.

More Related Content

What's hot

Iganfis Data Mining Approach for Forecasting Cancer Threats
Iganfis Data Mining Approach for Forecasting Cancer ThreatsIganfis Data Mining Approach for Forecasting Cancer Threats
Iganfis Data Mining Approach for Forecasting Cancer Threats
ijsrd.com
 

What's hot (20)

a novel approach for breast cancer detection using data mining tool weka
a novel approach for breast cancer detection using data mining tool wekaa novel approach for breast cancer detection using data mining tool weka
a novel approach for breast cancer detection using data mining tool weka
 
IRJET- Breast Cancer Prediction using Supervised Machine Learning Algorithms
IRJET- Breast Cancer Prediction using Supervised Machine Learning AlgorithmsIRJET- Breast Cancer Prediction using Supervised Machine Learning Algorithms
IRJET- Breast Cancer Prediction using Supervised Machine Learning Algorithms
 
Breast cancer diagnosis machine learning ppt
Breast cancer diagnosis machine learning pptBreast cancer diagnosis machine learning ppt
Breast cancer diagnosis machine learning ppt
 
Breast Cancer Diagnosis using a Hybrid Genetic Algorithm for Feature Selectio...
Breast Cancer Diagnosis using a Hybrid Genetic Algorithm for Feature Selectio...Breast Cancer Diagnosis using a Hybrid Genetic Algorithm for Feature Selectio...
Breast Cancer Diagnosis using a Hybrid Genetic Algorithm for Feature Selectio...
 
Breast cancer diagnosis and recurrence prediction using machine learning tech...
Breast cancer diagnosis and recurrence prediction using machine learning tech...Breast cancer diagnosis and recurrence prediction using machine learning tech...
Breast cancer diagnosis and recurrence prediction using machine learning tech...
 
Iganfis Data Mining Approach for Forecasting Cancer Threats
Iganfis Data Mining Approach for Forecasting Cancer ThreatsIganfis Data Mining Approach for Forecasting Cancer Threats
Iganfis Data Mining Approach for Forecasting Cancer Threats
 
Breast Cancer Diagnostics with Bayesian Networks
Breast Cancer Diagnostics with Bayesian NetworksBreast Cancer Diagnostics with Bayesian Networks
Breast Cancer Diagnostics with Bayesian Networks
 
Tomato leaves diseases detection approach based on support vector machines
Tomato leaves diseases detection approach based on support vector machinesTomato leaves diseases detection approach based on support vector machines
Tomato leaves diseases detection approach based on support vector machines
 
IRJET- GDPS - General Disease Prediction System
IRJET- GDPS - General Disease Prediction SystemIRJET- GDPS - General Disease Prediction System
IRJET- GDPS - General Disease Prediction System
 
Data Mining Techniques In Computer Aided Cancer Diagnosis
Data Mining Techniques In Computer Aided Cancer DiagnosisData Mining Techniques In Computer Aided Cancer Diagnosis
Data Mining Techniques In Computer Aided Cancer Diagnosis
 
IRJET - Prediction of Autistic Spectrum Disorder based on Behavioural Fea...
IRJET -  	  Prediction of Autistic Spectrum Disorder based on Behavioural Fea...IRJET -  	  Prediction of Autistic Spectrum Disorder based on Behavioural Fea...
IRJET - Prediction of Autistic Spectrum Disorder based on Behavioural Fea...
 
Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...
Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...
Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...
 
Early detection of breast cancer using mammography images and software engine...
Early detection of breast cancer using mammography images and software engine...Early detection of breast cancer using mammography images and software engine...
Early detection of breast cancer using mammography images and software engine...
 
Machine Learning - Breast Cancer Diagnosis
Machine Learning - Breast Cancer DiagnosisMachine Learning - Breast Cancer Diagnosis
Machine Learning - Breast Cancer Diagnosis
 
Predictive Analysis of Breast Cancer Detection using Classification Algorithm
Predictive Analysis of Breast Cancer Detection using Classification AlgorithmPredictive Analysis of Breast Cancer Detection using Classification Algorithm
Predictive Analysis of Breast Cancer Detection using Classification Algorithm
 
A new model for large dataset dimensionality reduction based on teaching lear...
A new model for large dataset dimensionality reduction based on teaching lear...A new model for large dataset dimensionality reduction based on teaching lear...
A new model for large dataset dimensionality reduction based on teaching lear...
 
Madhavi
MadhaviMadhavi
Madhavi
 
Machine Learning Based Approaches for Cancer Classification Using Gene Expres...
Machine Learning Based Approaches for Cancer Classification Using Gene Expres...Machine Learning Based Approaches for Cancer Classification Using Gene Expres...
Machine Learning Based Approaches for Cancer Classification Using Gene Expres...
 
Rapid COVID-19 Diagnosis Using Deep Learning of the Computerized Tomography ...
Rapid COVID-19 Diagnosis Using Deep Learning  of the Computerized Tomography ...Rapid COVID-19 Diagnosis Using Deep Learning  of the Computerized Tomography ...
Rapid COVID-19 Diagnosis Using Deep Learning of the Computerized Tomography ...
 
V34132136
V34132136V34132136
V34132136
 

Similar to IRJET- Exploring Colorectal Cancer Genes through Data Mining Techniques

Similar to IRJET- Exploring Colorectal Cancer Genes through Data Mining Techniques (20)

Breast Cancer Detection Using Machine Learning
Breast Cancer Detection Using Machine LearningBreast Cancer Detection Using Machine Learning
Breast Cancer Detection Using Machine Learning
 
IRJET - Survey on Analysis of Breast Cancer Prediction
IRJET - Survey on Analysis of Breast Cancer PredictionIRJET - Survey on Analysis of Breast Cancer Prediction
IRJET - Survey on Analysis of Breast Cancer Prediction
 
A Comprehensive Survey On Predictive Analysis Of Breast Cancer
A Comprehensive Survey On Predictive Analysis Of Breast CancerA Comprehensive Survey On Predictive Analysis Of Breast Cancer
A Comprehensive Survey On Predictive Analysis Of Breast Cancer
 
IRJET - Breast Cancer Prediction using Supervised Machine Learning Algorithms...
IRJET - Breast Cancer Prediction using Supervised Machine Learning Algorithms...IRJET - Breast Cancer Prediction using Supervised Machine Learning Algorithms...
IRJET - Breast Cancer Prediction using Supervised Machine Learning Algorithms...
 
SEMI SUPERVISED BASED SPATIAL EM FRAMEWORK FOR MICROARRAY ANALYSIS
SEMI SUPERVISED BASED SPATIAL EM FRAMEWORK FOR MICROARRAY ANALYSISSEMI SUPERVISED BASED SPATIAL EM FRAMEWORK FOR MICROARRAY ANALYSIS
SEMI SUPERVISED BASED SPATIAL EM FRAMEWORK FOR MICROARRAY ANALYSIS
 
A SURVEY ON BLOOD DISEASE DETECTION USING MACHINE LEARNING
A SURVEY ON BLOOD DISEASE DETECTION USING MACHINE LEARNINGA SURVEY ON BLOOD DISEASE DETECTION USING MACHINE LEARNING
A SURVEY ON BLOOD DISEASE DETECTION USING MACHINE LEARNING
 
IRJET- Diagnosis of Breast Cancer using Decision Tree Models and SVM
IRJET-  	  Diagnosis of Breast Cancer using Decision Tree Models and SVMIRJET-  	  Diagnosis of Breast Cancer using Decision Tree Models and SVM
IRJET- Diagnosis of Breast Cancer using Decision Tree Models and SVM
 
Breast Cancer Prediction
Breast Cancer PredictionBreast Cancer Prediction
Breast Cancer Prediction
 
LIFE EXPECTANCY PREDICTION FOR POST THORACIC SURGERY
LIFE EXPECTANCY PREDICTION FOR POST THORACIC SURGERYLIFE EXPECTANCY PREDICTION FOR POST THORACIC SURGERY
LIFE EXPECTANCY PREDICTION FOR POST THORACIC SURGERY
 
IRJET- Breast Cancer Prediction using Deep Learning
IRJET-  	  Breast Cancer Prediction using Deep LearningIRJET-  	  Breast Cancer Prediction using Deep Learning
IRJET- Breast Cancer Prediction using Deep Learning
 
IRJET- Breast Cancer Relapse Prognosis by Classic and Modern Structures o...
IRJET-  	  Breast Cancer Relapse Prognosis by Classic and Modern Structures o...IRJET-  	  Breast Cancer Relapse Prognosis by Classic and Modern Structures o...
IRJET- Breast Cancer Relapse Prognosis by Classic and Modern Structures o...
 
IRJET- Heart Disease Prediction System
IRJET- Heart Disease Prediction SystemIRJET- Heart Disease Prediction System
IRJET- Heart Disease Prediction System
 
Cervical Cancer Analysis
Cervical Cancer AnalysisCervical Cancer Analysis
Cervical Cancer Analysis
 
IRJET- Oral Cancer Detection using Machine Learning
IRJET- Oral Cancer Detection using Machine LearningIRJET- Oral Cancer Detection using Machine Learning
IRJET- Oral Cancer Detection using Machine Learning
 
Health Care Application using Machine Learning and Deep Learning
Health Care Application using Machine Learning and Deep LearningHealth Care Application using Machine Learning and Deep Learning
Health Care Application using Machine Learning and Deep Learning
 
IRJET- Machine Learning Classification Algorithms for Predictive Analysis in ...
IRJET- Machine Learning Classification Algorithms for Predictive Analysis in ...IRJET- Machine Learning Classification Algorithms for Predictive Analysis in ...
IRJET- Machine Learning Classification Algorithms for Predictive Analysis in ...
 
Simplified Knowledge Prediction: Application of Machine Learning in Real Life
Simplified Knowledge Prediction: Application of Machine Learning in Real LifeSimplified Knowledge Prediction: Application of Machine Learning in Real Life
Simplified Knowledge Prediction: Application of Machine Learning in Real Life
 
IRJET- Lung Cancer Nodules Classification and Detection using SVM and CNN...
IRJET-  	  Lung Cancer Nodules Classification and Detection using SVM and CNN...IRJET-  	  Lung Cancer Nodules Classification and Detection using SVM and CNN...
IRJET- Lung Cancer Nodules Classification and Detection using SVM and CNN...
 
LEAF DISEASE IDENTIFICATION AND REMEDY RECOMMENDATION SYSTEM USINGCNN
LEAF DISEASE IDENTIFICATION AND REMEDY RECOMMENDATION SYSTEM USINGCNNLEAF DISEASE IDENTIFICATION AND REMEDY RECOMMENDATION SYSTEM USINGCNN
LEAF DISEASE IDENTIFICATION AND REMEDY RECOMMENDATION SYSTEM USINGCNN
 
IRJET- A New Hybrid Squirrel Search Algorithm and Invasive Weed Optimization ...
IRJET- A New Hybrid Squirrel Search Algorithm and Invasive Weed Optimization ...IRJET- A New Hybrid Squirrel Search Algorithm and Invasive Weed Optimization ...
IRJET- A New Hybrid Squirrel Search Algorithm and Invasive Weed Optimization ...
 

More from IRJET Journal

More from IRJET Journal (20)

TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
 
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURESTUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
 
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
 
Effect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil CharacteristicsEffect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil Characteristics
 
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
 
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
 
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
 
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
 
A REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADASA REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADAS
 
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
 
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD ProP.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
 
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
 
Survey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare SystemSurvey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare System
 
Review on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridgesReview on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridges
 
React based fullstack edtech web application
React based fullstack edtech web applicationReact based fullstack edtech web application
React based fullstack edtech web application
 
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
 
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
 
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
 
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic DesignMultistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
 
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
 

Recently uploaded

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
ssuser89054b
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
mphochane1998
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Verification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptxVerification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptx
chumtiyababu
 

Recently uploaded (20)

Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
 
kiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadkiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal load
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdf
 
Wadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptxWadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptx
 
Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to Computers
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdf
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
PE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiesPE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and properties
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
 
Verification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptxVerification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptx
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.
 

IRJET- Exploring Colorectal Cancer Genes through Data Mining Techniques

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 04 | Apr 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 4770 EXPLORING COLORECTAL CANCER GENES THROUGH DATA MINING TECHNIQUES R. DEVENTHIRAN1, S. SENTHILKUMAR2 1 PG Scholar, Department of MCA, Arulmigu Meenakshi Amman College of Engineering, AnnaUniversity, Vadamavandal (near Kanchipuram), India. 2 Assistant Professor, Department of MCA, Arulmigu Meenakshi Amman College of Engineering, Anna University, Vadamavandal (near Kanchipuram), India. -------------------------------------------------------------------------***------------------------------------------------------------------------ ABSTRACT: - Data mining is used in various medical applications like tumor classification, protein structure prediction, gene classification, cancer classification based on microarray data, clustering of gene expression data, statistical model of protein-protein interaction etc. Drug events in prediction of medical test effectiveness can be done based on genomics and proteomics through data mining approaches. Cancer detection is one of the hot research topics in the bioinformatics. Classification of colon cancer dataset using WEKA 3.6, in which Logistics, IBK, KSTAR, NNGE, AD Tree, Random Forest Algorithms show 100 % correctly classified instances, followed by Navie bayes and PART with 97.22 %, Simple Cart and Zero R has shown the least with 50 % of correctly classified instances. Mean absolute error and Root mean squared error are shown low for Logistics, KSTAR and NNGE. 1 INTRODUCTION: Accumulation of data all over the globe is growing at a fast rate especially in the health field. One of the main interesting fields is cancer in general. In UAE, for example, there are 35% of residences affected by the colon cancer. Generally, the Colon Cancer often occurs with the Rectal Cancer and called Colorectal Cancer (CRC). All people over 68 are more disposed to have the Colorectal Cancer, and the percent of Colon and Rectum Cancer deaths is highest among people aged 75-84. Colon and Rectum Cancer represent 8% of all new Cancer cases where there will be an estimated 50,000 new cases of Colon Cancer in the United States in 2017. In UAE, the Colorectal Cancer is considered as the second largest cause of death from cancer after breast cancer. 1.1 EXISTING SYSTEM: Various types of medical cancer research are in top priority around the globe. Data mining can help clear out unseen cancer treatment for different kinds of populations, ethnicities, genders, economic and social influences that may affect assessments worked on Extreme Learning Machine (ELM) for classification. EXISTING TECHNIQUES: Zero R algorithm. TECHNIQUES DEFINITION: Zero Rule (Zero R) algorithm is the simplest classification method which relies on the target and ignores all predictors. It can be used to determine a baseline performance as a benchmark for other classification methods. DRAWBACKS: There is nothing to be said about the predictors contribution to the model because Zero R does not use any of them. 1.2 PROPOSED SYSTEM: Bayesian networks are a powerful probabilistic representation, and their predictor or attributes. In particular, the naive Bayes classifier is a Bayesian network where the class has no parents and each attribute has the class as its sole parent. The Naive Bayesian classifier is based on Bayes’ theorem with independence assumptions between predictors. A Naive Bayesian model is easy to build, with no complicated iterative parameter estimation which makes it particularly useful for very large datasets. PROPOSED TECHNIQUE: Naïve Bayes Multinomial algorithm. TECHNIQUE DEFINITION: This method goes by the name of Naïve Bayes, because it’s based on Bayes’ rule and “naïvely” assumes independence; it is only valid to multiply probabilities when the events are independent.
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 04 | Apr 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 4771 ADVANTAGES: Make the rule assign that class to this value of the predictors Calculate the total error of the rules of each predictor. 2. SYSTEMS SPECIFICATION: 2.1 HARDWARE REQUIREMENTS: PROCESSOR : DUAL CORE RAM : 4GB HARD DISK : 250 GB 2.2 SOFTWARE REQUIREMENTS:- FRONT END: J2EE (JSP, SERVLET), STRUCTS BACK END: ARFF DATASETS OPERATING SYSTEM: WINDOWS7 IDE : ECLIPSE. 3. PROJECT DESCRIPTION: MODULES:  Data Preprocessing Module  Data Predictor Module  Data Analysis Module  Data Reporting Module Data Preprocessing Module: We are first collecting all the data sets from available sources ,and based on our project we are preprocessed the data like separating the cancer tissues ,normal tissues ,cancer values ,normal values and their names based on the signed values . Data Predictor Module: In data prediction module, all the data sets like genes, tissues classification based on signed values, if it is positive value, the prediction will be normal tissue if it is negative range that will be cancer, based on that the tissue analysis happen and the expected genes result and after that we got the confusion matrix like the overall analysis of the tissue, genes, elements etc.. Data Analysis Module: In the data analysis module all the datasets are analyzed based on 4 Algorithm's and it will give the analyzed output based on analyze techniques. Genes Normal tissues Cancer tissues Prediction Confusio n Matrix Predicte d Tissue type Expected gene result DATA PREDICATOR MODULE MINING ACTUAL GENE VALUE FROM PREDICTION
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 04 | Apr 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 4772 Data Reporting Module: In the data reporting module all these prediction and analysis taken into account and our colon detector tool summarize all the output based on confusion matrix result like which algorithm gives the best accuracy and medicines for cancer tissues. 4. SYSTEM DESIGN 4.1 SYSTEM ARCHITECTURE: 5. SCOPE OF THE PROJECT: The project works for the Cancer detection from gene expression data. We presented a deep learning technique to detect cancer and to identify critical genes for the diagnosis of cancer. Their results showed that the highly interactive genes could be useful as cancer biomarkers for the detection of cancer. By using Zero R algorithm, Naive Bayes algorithm, Ripper algorithm, and J48 algorithm. We are detecting the cancer genes and normal By using Zero R algorithm, Naive Bayes algorithm, We are detecting the cancer genes and normal genes from the output of each algorithm, the accuracy level gets vary. Colorectal Cancer qualities in a Nutshell: By and large, Cancer is a crazy cell development illness. Colon Cancer and Rectal Cancer frequently happen together and are called Colorectal Cancer (CRC). The Colon is the most reduced piece of the stomach related framework. Roughly, individuals more than 68 are increasingly inclined to have the Colorectal Cancer, and the percent of Colon and Rectum Cancer passing is most noteworthy among individuals matured 75-84. Colon and Rectum Cancer speak to 8% of all new Cancer cases. DATA MINING TECHNIQUES: Data mining techniques process the information by using specific methods to discover this data and extract vital points from the big data set. Including methods such as generalization, characterization, classification, clustering, association, evolution, pattern matching and data visualization, and not limited to meta-rule guided Mining. (i) Zero R Algorithim Zero Rule (Zero R) algorithm is the simplest classification method which relies on the target and ignores all predictors. It can be used to determine a baseline performance as a benchmark for other classification methods. (ii) Naïve Bayes Multinomial algorithm This method goes by the name of Naïve Bayes, because it’s based on Bayes’ rule and “naïvely” assumes independence; it is only valid to multiply probabilities when the events are independent. Bayesian networks are a powerful probabilistic representation, and their predictor or attributes. (iii)Repeated Incremental Pruning to Produce Error Reduction (RIPPER) RIPPER is a well-known algorithm that utilizes separate-and-conquer strategy to distinct between positive and negative instances. It is considered as the master of the inductive rule learning that has the ability to deal with over-fitting problem. (iv)J48 J48 is a prediction algorithm that can be used successfully in many demanding problem domains and also to create Decision Trees. It is like a dataset including two lists: a list of predictors (independent variables) and a list of targets (dependent variables) that would allow you to predict the target variable of a new dataset record. Decision tree J48 implementation is based on the C4.5 algorithm ID3 (Iterative Dichotomies 3) developed by the WEKA project team. ACTUAL GENE VALUE PREDICTION GENE ANALYSIS REPORT
  • 4. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 04 | Apr 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 4773 PROJECT IMPLEMENTATION: (i) Data Collection (ii) Classification (iii) Clustering (iv) Prediction The Colon Cancer is a forceful and understood illness that influences individuals all around the globe. To examine this sort of malignancy, this examination endeavors to investigate the relations between the qualities in charge of colon disease by means of Data Mining arrangement procedures. WEKA has been utilized as a Data mining device to investigate a known dataset for Colorectal Cancer qualities from writing. Distinctive procedures were utilized and thought about. The outcomes were promising and the investigation step can be talked about with a specialist soon. Accumulation of data all over the globe is growing at a fast rate especially in the health field. One of the main interesting fields is cancer in general. The Colon Cancer, in particular, is an aggressive and well known disease that affects people all around the world. In UAE, for example, there are 35% of residences affected by the colon cancer. Generally, the Colon Cancer often occurs with the Rectal Cancer and called Colorectal Cancer (CRC). (i)Data Collection: In data collection phase different cancer genes collected as in the form of data sets. The information includes the normal tissues, cancer tissues etc. (ii)Classification: Classification is the most common data mining technique, which employs a set of pre classified examples to develop a model that can classify the population of records at large. This approach frequently employs decision tree or neural network - based classification algorithms. The data classification process involves learning and classification. (iii)Clustering: Clustering can be said as identification of similar classes of objects. By using clustering techniques we can further identify dense and sparse regions in object space and can discover overall distribution pattern and correlations among data attributes. (iv)Prediction: Predicting the cancer genes based upon four important algorithms based on previously cancer genes and normal genes behavior and activities. In this module can predict which tissues are affected by cancer genes and normal genes. 6. CONCLUSIONS: This simple study aims at exploring the Colorectal Cancer disease using the available data in literature. The used dataset contains the expression of the 2000 genes with highest minimal intensity across the 62 tissues with 40 tumor tissues and 22 normal ones. The best algorithm in terms of classification performance was Naïve Bayes Multinomial with very close results. The next step was to filter the 2000 genes into 26 using a well-known feature selection algorithm and apply a rule- based classification algorithm on the new dataset. The generated rules were very clear and adding a new insight to the database in general. It seems that the gene “M26383” is a very important gene that can determine a normal tissue if it is below 56.9. As a general finding, these rules can be further studied by an expert in the field. 7. FUTURE ENHANCEMENT: Designing efficient online mechanisms for a bi-directional market is significantly more challenging. We consider server energy cost minimization in social welfare maximization, and reveal an important property, sub modularity, of the objective function in the resulting significantly more challenging offline problem. 8. REFERENCES: [1] (2017, February) Centers for Disease Control and Prevention.[Online]. HYPERLINK "https://www.cdc.gov/" https://www.cdc.gov/ [2] National Cancer Institute. [Online]. HYPERLINK "https://seer.cancer.gov/statfacts/html/colorect.html" https://seer.cancer.gov/statfacts/html/colorect.html [3] (2017, February) The Surveillance, Epidemiology, and End Results.[Online]. HYPERLINK "https://seer.cancer.gov/" https://seer.cancer.gov/ [4] Centers for Disease Control and Prevention (CDC).[Online]. HYPERLINK "https://www.cdc.gov/cancer/colorectal/index.htm" https://www.cdc.gov/cancer/colorectal/index.htm [5] U. Alon et al., "Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays," Proceedings of the National Academy of Sciences, vol. 96, no. 12, pp. 6745- 6750, 1999. [6] Jesse Samuel Moore and Tess Hannah Aulet, "Colorectal Cancer Screening," Surgical Clinics of North Ameria, vol. 97, no. 3, pp. 487-502, 2017. [7]"https://seer.cancer.gov/statfacts/html/colorect.html" [8]https://seer.cancer.gov/statfacts/html/colorect.html.