Design
of an Intelligent System for Improving
Classification of Cancer Diseases
PREPARED BY:
MOHAMMED WAJEEH JASIM
SUPERVISOR BY :
PROF. HAZEM M. EL-BAKRY
DR. MOHAMMED LOEY
Topics
v INTRODUCTION
v PROBLEM DEFINITION AND CHALLENGES
v RESEARCH OBJECTIVES
v RESEARCH MOTIVATIONS
v THE PROPOSED SYSTEM
v RESULTS ANALYSIS AND EVALUATION
v CONCLUSION AND FUTURE WORK
v ACKNOWLEDGMENTS
v REFERENCES
Introduction 1/3
 Cancer is a common disease caused by certain abnormal changes
to genes that are responsible for the cells division and growth.
 These recognizable changes include the mutations of the DNA
which make up genes.
 Generally, cancer cells have significantly more genetic changes
than normal cells.
Introduction 2/3
o Early detection of cancer increases the probability of recovery.
o This research presents an intelligent decision support system
(IDSS) for early diagnosis of cancer
o The proposed methodology was applied to three data sets
(breast, colon, and CNS)
o The proposed system is assessed and compared with related work
performance.
Introduction 3/3
Development of
Cancer
Size of
cancer
Cell
Cancer cell at molecular and cell
level
Self Immune
exclusion
Normal Genetic
disorder
Proliferation
stars
Premalignant
lesion
Malignant
Cancer
Death
metastases
Cancer
Cell of
1kg
Genetic diagnosis level
Evaluation of the cancer risk at the
premalignant level and detection of the
minute cancer cell that is unable to
discover by tests such as imaging tests
Early diagnosis, prevention
control, prevention control for
recurrent
Image test level
The Diagram is based on The Hypothesis of The Multi-Staged Development of Cancer.
Problem Definition
and Challenges
• Cancer Classification Problem
Challenges
1. Curse of dimensionality
2. Difficulty of data extraction
3. Only a few genes display relevancy to cancer diagnoses
4. In cancers domain we want to achieve, biological relevancy as well as
classification accuracy and reliability.
• DNA Microarray Technology [?]
• Gene Expression Profiles [?]
Research Objectives
• To build a classification system for classifying cancer based
on the gene expression profiles
• To help biologists in the discovery and diagnosis of future
cancer cases
• It shall provide competitive advantages over the existing
systems
• The proposed system shall cover several types of cancer
diseases in order to assure the suitability of this system for
cancer classification application domain
Research Motivations
World Cancer Research Fund (WCRF)
• During the last decade, there was an increase of 20% in the new discovered cancer cases
resulting in 27 million cases worldwide
World Health Organization (WHO)
• Number of cancer cases in the Middle East will be doubled by 2030
• Middle East will have the highest relative increase of all regions in the world
According to the World Cancer Organization
• About 4,610 cases will be diagnosed in 2018 of central nervous system tumors and
various brain tumors in children under the age of 20 in the United States. .
• In 2013, an estimated 23,130 new case of brain and other nervous system and the death
cases estimated 14,080 persons in the United States.
The proposed System 1/9
Wrapper Feature Selection
approach Output PhaseInput Phase
Gene
Expression
Profile
Data set Report
Feature
Selection
By IG
Feature
Reduction
By GWO
Classificati
on By SVM
Filter Feature
Selection
approach
Classifier
Architecture of the System
The proposed System 2/9
Data sets Classes Genes Samples Class Distribution
Breast Cancer
relapse,
non-relapse
24481
97 Samples divided to
Train Samples 78, test
Samples 19
Train Samples (34
relapse & 44 non-
relapse), test Samples
(12 relapse & 7 non-
relapse)
Colon tumor
Normal
("Positive"), Tumor
("Negative")
2000 62
Positive: 22
Negative:40
Central Nervous
System (CNS)
Failures ("Class
0"), Survivors
("class 1 ")
7129 60
Class 0: 39
Class 1: 21
Description of Benchmark Datasets
The proposed System 3/9
Gene Selection Technique
• There is more than one technique for selecting
genes for each technique positives and negatives
• In the proposed system, two approaches were
used ( Filter Approach and Wrapper Approach)
Filter Approach
• Filter techniques are just ranking techniques which
asses the relevance of a gene by considering only
the intrinsic properties of the data.
• The proposed methodology uses information gain
(IG) for selecting the most significant features
from the input patterns.
The proposed System 4/9
IG flowchart
Yes
Enter data
have A
attribute
Initialize the
Parameters
I=0
S=0
E= ∅
Entropy
Calculation
Initialize
Attributes
of classes
Is S ˃ Max
number of
samples?
Calculate
conditional
probability &
conditional
entropy
IG Calculation
Is I ˃Max
number of
attributes?
I+1
E= IG value
{I}
Arrange E values in
ascending order
End
Start
No
Yes
No
S=S+1
The proposed System 5/9
Feature Reduction
 The selected features (genes)
by IG filter are reduced by
applying the Wrapper
Approach using Gray Wolf
Optimization algorithm (GWO).
The proposed System 6/9
Yes
No
Yes
No
Calculate fitness value
for each wolf
𝑌𝛼= first best fitness value,
𝑌𝛽 = second best fitness
value, 𝑌𝛿= best third fitness
value.
Modify the position of
the current search wolf
Generate the initial grey
wolf pack and the
parameters b, B and F.
Is I ˃Max
number of
iterations?
Start
End
Yes
Is W ˃Max
number of
Wolves?
Modify the
values of b, B
and F
Is W ˃Max
number of
Wolves?
Calculate
fitness value
Return Yα
No
I=0
W=0
I=number of iterations
W= number of wolves
I=I+1
W=W+1
GOW flowchart.
The proposed System 7/9
Classification by Support Vector Machines
SVMs
Reference Dataset Classifier Accuracy in %
[6], weka tool CNS, Colon, Breast
C4.5 55, 76.19, 61
Naïve Bayes 63.3, 52.14, 51.89
IB1 55.79, 73.38, 60.22
In this work CNS, Colon, Breast SVM 70, 63, 65
The Proposed
System
8/10
Flowchart
The proposed System 9/9
Implementation of the system
I. In the proposed mythology, the C#.net was used for implementing the
system.
II. The weka tool version 3.8 was employed to call its core by C#.net to
apply the IG filter on the dataset for attribute selection.
III. Then, the selected attributes are reduced by GWO that programmed by
C#.net.
IV. Finally, the classifier that called from weka into the C#.net will give the
final classification accuracy.
V. The proposed system uses 5-folds of cross-validation.
Results
analysis and
evaluation
1/10
SVM
Classification
accuracy of
microarray
dataset, and
SVM with
GWO using
Dataset
SVM SVM+GWO
No.
genes
CAcc in
%
No.
wolfs
No.
iterations
No.
genes
CAcc in
%
CNS 7129 70
20
25
50
75
50
50
75
150
4683
3590
3583
4672
73.3
73.3
71.6
73.3
Breast 24482 64.1
25
35
50
100
100
50
75
100
20
25
16285
16055
16122
16104
12259
70.512
71.795
70.512
70.512
70.512
Results
analysis and
evaluation
2/10
SVM
Classification
accuracy of
microarray
dataset, and
SVM with
GWO using
5-fold of
colon 2000 80
12
20
30
40
50
200
75
100
120
600
1000
11
30
30
30
30
50
100
120
160
50
2
1316
1311
1014
1012
963
1017
1335
1013
999
1025
996
83.87
83.87
83.87
83.87
83.87
83.87
85.484
83.87
85.484
85.484
85.484
Dataset No.
genes
CAcc
in %
No.
wolfs
No.
iterations
No.
genes
CAcc in
%
Results
analysis
and
evaluation
3/10
Dataset IG+SVM IG+GWO+SVM
No. genes Before
IG
No.
genes
CAcc in %
No.
iterations
No.
wolfs
No.
genes
CAcc in %
CNS 7129 73 90
50
50
75
150
20
25
50
75
50
51
47
49
96.667
95
96.667
95
Breast 24482 715 82
20
50
70
120
150
12
20
50
100
120
504
470
478
455
459
87.17
88.46
88.46
88.46
88.46
colon 2000 135 87.096
25
30
50
75
13
15
20
50
81
80
66
70
90.32
90.32
90.32
90.32
CAcc of
SVM +IG
threshold
value is zero
and 5-fold of
cross-
SVM CAcc With Multiple IG Threshold of 5-fold cross-validation
Dataset
IG+SVM After multiple IG threshold values
Threshold = 0.17 Threshold = 0.2 Threshold = 0.198 Threshold = 0.29
No.
genes
CAcc
No.
genes
CAcc
No.
genes
CAcc
No.
genes
CAcc
CNS 73 90 54 88.3 55 91.67% 14 66.67%
Breast 350 80.77 398 84.61 441 83.3 28 78.2
Colon 135 87.09 108 85.48 117 87.9 31 82.25
Results analysis and evaluation 4/10
Results analysis
and evaluation 5/10
CAcc for Multiple IG Threshold + GWO
+ SVM of 5-fold cross-validation.
Dataset
IG+GWO+SVM
No.
wolfs
No.
Iterations
CAcc After multiple IG threshold values and No. genes
Threshold =
0.17
Threshold = 0.2
Threshold =
0.198
Threshold =
0.29
No.
genes
CAcc
No.
genes
CAcc
No.
genes
CAcc
No.
genes
CAcc
CNS
20 50 50 96.67% 39 96.67 47 95 12 68.3
25 50 49 95% 36 95 39 96.67 8 68.3
50 75 39 93% 35 95 36 95 11 68.11
75 150 45 96.67% 41 95 32 96.67 11 68.3
Brest
20 50 338 88.46 282 91.026 290 89.74 20 83.3
50 70 349 89.74 260 91.026 249 91.026 17 83.3
100 120 337 89.74 272 92..307 250 94.87 16 84.61
120 150 351 91.025 245 92.307 270 93.59 18 84.61
Colon
13 25 74 88.7 50 88.7 74 90.322 17 90.322
15 30 75 90.322 77 90.322 78 90.322 23
91.93
5
20 50 64 90.322 70 90.322 82 90.322 17 90.322
50 75 85 90.322 62 90.322 56 90.322 17 95.161
Results analysis and evaluation 6/10
Dataset Threshold
No.
Genes
Accuracy Precision Recall F1
CNS 0.198 32 96.667 100 91 95.5
Brest 0.198 250 94.87 95 90 92
Colon 0.29 17 95.161 95.2 90.9 93
The best results obtained by applying the proposed
methodology on the datasets.
Results analysis and
evaluation 7/10
Dataset
SVM GWO+SVM
Before modified IG Threshold After modified IG Threshold
IG+SVM
IG+GWO+
SVM
IG +SVM
IG + GWO +
SVM
No.
genes
CAcc
No.
genes
CAcc
No.
genes
CAcc
No.
genes
CAcc
No.
genes
CAcc
No.
genes
CAcc
Breast
24482
65 16055 71.795 715 82 455 88.46 398 84.61 250 94.87
Colon
2000
63 999 85.484 135 87.096 66 90.32 117 87.9 17 95.161
CNS
7129
70 3590 73.3 73 90 47 96.67 55 91.67 32 96.67
Summary of the best results obtained by applying the proposed
methodology on the datasets.
Results analysis and evaluation 8/10
The CAcc of the proposed methodology VS other methodologies on
Reference
Methodology
Accuracy in %
Filter Wrapper classifier
[1]
T-Statistics, SNR, F-Test GA SVM 81.25
T-Statistics, SNR, F-Test GA KNN 81.25
[2] IG GA GP 86.67
[3]
Collection of attribute selectors and classification
algorithms
75.49
[4] Optimized Fuzzy Rule Generation (OFRG) algorithm 95
This work IG GWO SVM 96.67
Results analysis and evaluation 9/10
Reference Methodology Accuracy in %
[3] ReliefF + 3-NN 70.96
[4] Optimized Fuzzy Rule Generation (OFRG) algorithm 94
[5]
filtering and normalization + PSO + SVM
94
filtering and normalization + GA + SVM
This work IG GWO SVM 94.87
The CAcc of the proposed methodology VS other
methodologies on Breast.
The CAcc of the proposed methodology VS
other methodologies on Colon.
Results analysis and evaluation 10/10
Reference Methodology Accuracy in %
[1]
T-Statistics, SNR, F-Test GA SVM 85
T-Statistics, SNR, F-Test GA KNN 85
[3] Random + SVM 88.41
[2] IG+GA+PG 85.48
This work IG GWO SVM 95.161
Conclusion
• Despite the simplicity of the proposed system compared
to previous work, it has overcome it.
• The approach is superior to the system proposed by [1, 2,
3, 4, 5] in term of classification accuracy.
• The proposed system reducing the consumption of time
and memory that they are necessary for the classification
process.
• The experimental results indicate that the proposed
methodology is able to enhance the stability of the
classification accuracy as well as the features selection.
Publication
1. http://www.globalcis.org/dl/citation.html?id=JCIT-
4416&fbclid=IwAR14Zseub0ooasvX_wtDIC4jI3jcu_rq3ToI2b3VROOF
oRaQJuhRqxASFsM
2. http://www.ijarcst.com/doc/vol6issue3/mohammed.pdf?fbclid=IwA
R1kexWwX530SogO55ZoFXd81pBC1YocL7Bj2vVWZfaiBar1OCYQSCqk
NyE
paper
paper
References
1. C. Gunavathi and K. Premalatha, "Performance analysis of
genetic algorithm with kNN and SVM for feature selection
in tumor classification," Int J Comput Electr Autom Control
Inf Eng, vol. 8, pp. 1490-7, 2014. [
2. Salem, Hanaa, Gamal Attiya, and Nawal El-Fishawy.
"Classification of human cancer diseases by gene
expression profiles." Applied Soft Computing: 124-134 , 50
(2017).
3. C. J. Alonso-Gonz{'a}lez, Q. I. Moro-Sancho, A. Simon-
Hurtado and R. Varela-Arrabal, "Microarray gene
expression classification with few genes: Criteria to
combine attribute selection and classification methods,"
Expert Systems with Applications, vol. 39, pp. 7270-7280,
2012.
4. A. Paul, J. Sil and C. D. Mukhopadhyay, "Gene selection
for designing optimal fuzzy rule base classifier by
estimating missing value," Applied Soft Computing, vol. 55,
pp. 276-288, 2017.
5. N. Y. Moteghaed, K. Maghooli and M. Garshasbi,
"Improving Classification of Cancer and Mining Biomarkers
from Gene Expression Profiles Using Hybrid Optimization
Algorithms and Fuzzy Support Vector Machine," Journal of
medical signals and sensors, vol. 8, p. 1, 2018.
6. V. Bol{'o}n-Canedo, N. S{'a}nchez-Maro{~n}o and A.
Alonso-Betanzos, "An ensemble of filters and classifiers
for microarray data classification," Pattern Recognition,
vol. 45, pp. 531-539, 2012.
Spare Slides
Microarray Technology[back]
 The most basic units for all organisms are Cells.
 Each cell contains a nucleus.
 In side the nucleus there are chromosomes which arranged by
deoxyribonucleic acid (DNA).
 The nucleotides represent the basic units of DNA which consist of sugar
phosphate backbone and one of the four bases:
Adenine (A), cytosine (C), guanine (G), and thymine (T) .
 The hereditary information coded by the DNA through the particular order of
these base pairs on a double-stranded helix for making future organisms.
The Relationship Among the Cell, The Nucleus, a
Chromosome and a Gene.
Cell Nucleus where Chromosomes "live"
Cell Chromosomes
Contain All genetic material
DNA
The material from
which chromosomes
are constructed
Gene
A segment of
a chromosome
(made up of DNA)
Microarray Technology [back]
Gene Expression Profiles [back]
• Amount of mRNA produced by each cell
• Can be monitored to detect alteration or mutations
• Each cell needs to construct protein in order to develop and function
• The process of constructing proteins :
• Transcription: coping the genetic information into mRNA molecules
• Translation: translate mRNA into proteins’ amino acid sequences
• Once the protein is constructed, gene is said to be expressed
• mRNA sequences are capable to hybridize with their complementary DNA (cDNA)
DNA
AA
CG
TA
AT
C
A
CG
AT
T
CG
T
transcription
RNA
translation
Amino Acid chain
folding protein
DNA double-
stranded helix
DNA
AA
CG
T A
A T
C
A
CG
A T
T
C G
T
Microarray
Technology
Microarray Datasets
Microarray Technology
The microarray data used in the proposed system
Microarray Technology
…

Design of an Intelligent System for Improving Classification of Cancer Diseases

  • 1.
    Design of an IntelligentSystem for Improving Classification of Cancer Diseases PREPARED BY: MOHAMMED WAJEEH JASIM SUPERVISOR BY : PROF. HAZEM M. EL-BAKRY DR. MOHAMMED LOEY
  • 2.
    Topics v INTRODUCTION v PROBLEMDEFINITION AND CHALLENGES v RESEARCH OBJECTIVES v RESEARCH MOTIVATIONS v THE PROPOSED SYSTEM v RESULTS ANALYSIS AND EVALUATION v CONCLUSION AND FUTURE WORK v ACKNOWLEDGMENTS v REFERENCES
  • 3.
    Introduction 1/3  Canceris a common disease caused by certain abnormal changes to genes that are responsible for the cells division and growth.  These recognizable changes include the mutations of the DNA which make up genes.  Generally, cancer cells have significantly more genetic changes than normal cells.
  • 4.
    Introduction 2/3 o Earlydetection of cancer increases the probability of recovery. o This research presents an intelligent decision support system (IDSS) for early diagnosis of cancer o The proposed methodology was applied to three data sets (breast, colon, and CNS) o The proposed system is assessed and compared with related work performance.
  • 5.
    Introduction 3/3 Development of Cancer Sizeof cancer Cell Cancer cell at molecular and cell level Self Immune exclusion Normal Genetic disorder Proliferation stars Premalignant lesion Malignant Cancer Death metastases Cancer Cell of 1kg Genetic diagnosis level Evaluation of the cancer risk at the premalignant level and detection of the minute cancer cell that is unable to discover by tests such as imaging tests Early diagnosis, prevention control, prevention control for recurrent Image test level The Diagram is based on The Hypothesis of The Multi-Staged Development of Cancer.
  • 6.
    Problem Definition and Challenges •Cancer Classification Problem Challenges 1. Curse of dimensionality 2. Difficulty of data extraction 3. Only a few genes display relevancy to cancer diagnoses 4. In cancers domain we want to achieve, biological relevancy as well as classification accuracy and reliability. • DNA Microarray Technology [?] • Gene Expression Profiles [?]
  • 7.
    Research Objectives • Tobuild a classification system for classifying cancer based on the gene expression profiles • To help biologists in the discovery and diagnosis of future cancer cases • It shall provide competitive advantages over the existing systems • The proposed system shall cover several types of cancer diseases in order to assure the suitability of this system for cancer classification application domain
  • 8.
    Research Motivations World CancerResearch Fund (WCRF) • During the last decade, there was an increase of 20% in the new discovered cancer cases resulting in 27 million cases worldwide World Health Organization (WHO) • Number of cancer cases in the Middle East will be doubled by 2030 • Middle East will have the highest relative increase of all regions in the world According to the World Cancer Organization • About 4,610 cases will be diagnosed in 2018 of central nervous system tumors and various brain tumors in children under the age of 20 in the United States. . • In 2013, an estimated 23,130 new case of brain and other nervous system and the death cases estimated 14,080 persons in the United States.
  • 9.
    The proposed System1/9 Wrapper Feature Selection approach Output PhaseInput Phase Gene Expression Profile Data set Report Feature Selection By IG Feature Reduction By GWO Classificati on By SVM Filter Feature Selection approach Classifier Architecture of the System
  • 10.
    The proposed System2/9 Data sets Classes Genes Samples Class Distribution Breast Cancer relapse, non-relapse 24481 97 Samples divided to Train Samples 78, test Samples 19 Train Samples (34 relapse & 44 non- relapse), test Samples (12 relapse & 7 non- relapse) Colon tumor Normal ("Positive"), Tumor ("Negative") 2000 62 Positive: 22 Negative:40 Central Nervous System (CNS) Failures ("Class 0"), Survivors ("class 1 ") 7129 60 Class 0: 39 Class 1: 21 Description of Benchmark Datasets
  • 11.
    The proposed System3/9 Gene Selection Technique • There is more than one technique for selecting genes for each technique positives and negatives • In the proposed system, two approaches were used ( Filter Approach and Wrapper Approach) Filter Approach • Filter techniques are just ranking techniques which asses the relevance of a gene by considering only the intrinsic properties of the data. • The proposed methodology uses information gain (IG) for selecting the most significant features from the input patterns.
  • 12.
    The proposed System4/9 IG flowchart Yes Enter data have A attribute Initialize the Parameters I=0 S=0 E= ∅ Entropy Calculation Initialize Attributes of classes Is S ˃ Max number of samples? Calculate conditional probability & conditional entropy IG Calculation Is I ˃Max number of attributes? I+1 E= IG value {I} Arrange E values in ascending order End Start No Yes No S=S+1
  • 13.
    The proposed System5/9 Feature Reduction  The selected features (genes) by IG filter are reduced by applying the Wrapper Approach using Gray Wolf Optimization algorithm (GWO).
  • 14.
    The proposed System6/9 Yes No Yes No Calculate fitness value for each wolf 𝑌𝛼= first best fitness value, 𝑌𝛽 = second best fitness value, 𝑌𝛿= best third fitness value. Modify the position of the current search wolf Generate the initial grey wolf pack and the parameters b, B and F. Is I ˃Max number of iterations? Start End Yes Is W ˃Max number of Wolves? Modify the values of b, B and F Is W ˃Max number of Wolves? Calculate fitness value Return Yα No I=0 W=0 I=number of iterations W= number of wolves I=I+1 W=W+1 GOW flowchart.
  • 15.
    The proposed System7/9 Classification by Support Vector Machines SVMs Reference Dataset Classifier Accuracy in % [6], weka tool CNS, Colon, Breast C4.5 55, 76.19, 61 Naïve Bayes 63.3, 52.14, 51.89 IB1 55.79, 73.38, 60.22 In this work CNS, Colon, Breast SVM 70, 63, 65
  • 16.
  • 17.
    The proposed System9/9 Implementation of the system I. In the proposed mythology, the C#.net was used for implementing the system. II. The weka tool version 3.8 was employed to call its core by C#.net to apply the IG filter on the dataset for attribute selection. III. Then, the selected attributes are reduced by GWO that programmed by C#.net. IV. Finally, the classifier that called from weka into the C#.net will give the final classification accuracy. V. The proposed system uses 5-folds of cross-validation.
  • 18.
    Results analysis and evaluation 1/10 SVM Classification accuracy of microarray dataset,and SVM with GWO using Dataset SVM SVM+GWO No. genes CAcc in % No. wolfs No. iterations No. genes CAcc in % CNS 7129 70 20 25 50 75 50 50 75 150 4683 3590 3583 4672 73.3 73.3 71.6 73.3 Breast 24482 64.1 25 35 50 100 100 50 75 100 20 25 16285 16055 16122 16104 12259 70.512 71.795 70.512 70.512 70.512
  • 19.
    Results analysis and evaluation 2/10 SVM Classification accuracy of microarray dataset,and SVM with GWO using 5-fold of colon 2000 80 12 20 30 40 50 200 75 100 120 600 1000 11 30 30 30 30 50 100 120 160 50 2 1316 1311 1014 1012 963 1017 1335 1013 999 1025 996 83.87 83.87 83.87 83.87 83.87 83.87 85.484 83.87 85.484 85.484 85.484 Dataset No. genes CAcc in % No. wolfs No. iterations No. genes CAcc in %
  • 20.
    Results analysis and evaluation 3/10 Dataset IG+SVM IG+GWO+SVM No.genes Before IG No. genes CAcc in % No. iterations No. wolfs No. genes CAcc in % CNS 7129 73 90 50 50 75 150 20 25 50 75 50 51 47 49 96.667 95 96.667 95 Breast 24482 715 82 20 50 70 120 150 12 20 50 100 120 504 470 478 455 459 87.17 88.46 88.46 88.46 88.46 colon 2000 135 87.096 25 30 50 75 13 15 20 50 81 80 66 70 90.32 90.32 90.32 90.32 CAcc of SVM +IG threshold value is zero and 5-fold of cross-
  • 21.
    SVM CAcc WithMultiple IG Threshold of 5-fold cross-validation Dataset IG+SVM After multiple IG threshold values Threshold = 0.17 Threshold = 0.2 Threshold = 0.198 Threshold = 0.29 No. genes CAcc No. genes CAcc No. genes CAcc No. genes CAcc CNS 73 90 54 88.3 55 91.67% 14 66.67% Breast 350 80.77 398 84.61 441 83.3 28 78.2 Colon 135 87.09 108 85.48 117 87.9 31 82.25 Results analysis and evaluation 4/10
  • 22.
    Results analysis and evaluation5/10 CAcc for Multiple IG Threshold + GWO + SVM of 5-fold cross-validation. Dataset IG+GWO+SVM No. wolfs No. Iterations CAcc After multiple IG threshold values and No. genes Threshold = 0.17 Threshold = 0.2 Threshold = 0.198 Threshold = 0.29 No. genes CAcc No. genes CAcc No. genes CAcc No. genes CAcc CNS 20 50 50 96.67% 39 96.67 47 95 12 68.3 25 50 49 95% 36 95 39 96.67 8 68.3 50 75 39 93% 35 95 36 95 11 68.11 75 150 45 96.67% 41 95 32 96.67 11 68.3 Brest 20 50 338 88.46 282 91.026 290 89.74 20 83.3 50 70 349 89.74 260 91.026 249 91.026 17 83.3 100 120 337 89.74 272 92..307 250 94.87 16 84.61 120 150 351 91.025 245 92.307 270 93.59 18 84.61 Colon 13 25 74 88.7 50 88.7 74 90.322 17 90.322 15 30 75 90.322 77 90.322 78 90.322 23 91.93 5 20 50 64 90.322 70 90.322 82 90.322 17 90.322 50 75 85 90.322 62 90.322 56 90.322 17 95.161
  • 23.
    Results analysis andevaluation 6/10 Dataset Threshold No. Genes Accuracy Precision Recall F1 CNS 0.198 32 96.667 100 91 95.5 Brest 0.198 250 94.87 95 90 92 Colon 0.29 17 95.161 95.2 90.9 93 The best results obtained by applying the proposed methodology on the datasets.
  • 24.
    Results analysis and evaluation7/10 Dataset SVM GWO+SVM Before modified IG Threshold After modified IG Threshold IG+SVM IG+GWO+ SVM IG +SVM IG + GWO + SVM No. genes CAcc No. genes CAcc No. genes CAcc No. genes CAcc No. genes CAcc No. genes CAcc Breast 24482 65 16055 71.795 715 82 455 88.46 398 84.61 250 94.87 Colon 2000 63 999 85.484 135 87.096 66 90.32 117 87.9 17 95.161 CNS 7129 70 3590 73.3 73 90 47 96.67 55 91.67 32 96.67 Summary of the best results obtained by applying the proposed methodology on the datasets.
  • 25.
    Results analysis andevaluation 8/10 The CAcc of the proposed methodology VS other methodologies on Reference Methodology Accuracy in % Filter Wrapper classifier [1] T-Statistics, SNR, F-Test GA SVM 81.25 T-Statistics, SNR, F-Test GA KNN 81.25 [2] IG GA GP 86.67 [3] Collection of attribute selectors and classification algorithms 75.49 [4] Optimized Fuzzy Rule Generation (OFRG) algorithm 95 This work IG GWO SVM 96.67
  • 26.
    Results analysis andevaluation 9/10 Reference Methodology Accuracy in % [3] ReliefF + 3-NN 70.96 [4] Optimized Fuzzy Rule Generation (OFRG) algorithm 94 [5] filtering and normalization + PSO + SVM 94 filtering and normalization + GA + SVM This work IG GWO SVM 94.87 The CAcc of the proposed methodology VS other methodologies on Breast.
  • 27.
    The CAcc ofthe proposed methodology VS other methodologies on Colon. Results analysis and evaluation 10/10 Reference Methodology Accuracy in % [1] T-Statistics, SNR, F-Test GA SVM 85 T-Statistics, SNR, F-Test GA KNN 85 [3] Random + SVM 88.41 [2] IG+GA+PG 85.48 This work IG GWO SVM 95.161
  • 28.
    Conclusion • Despite thesimplicity of the proposed system compared to previous work, it has overcome it. • The approach is superior to the system proposed by [1, 2, 3, 4, 5] in term of classification accuracy. • The proposed system reducing the consumption of time and memory that they are necessary for the classification process. • The experimental results indicate that the proposed methodology is able to enhance the stability of the classification accuracy as well as the features selection.
  • 29.
  • 30.
    References 1. C. Gunavathiand K. Premalatha, "Performance analysis of genetic algorithm with kNN and SVM for feature selection in tumor classification," Int J Comput Electr Autom Control Inf Eng, vol. 8, pp. 1490-7, 2014. [ 2. Salem, Hanaa, Gamal Attiya, and Nawal El-Fishawy. "Classification of human cancer diseases by gene expression profiles." Applied Soft Computing: 124-134 , 50 (2017). 3. C. J. Alonso-Gonz{'a}lez, Q. I. Moro-Sancho, A. Simon- Hurtado and R. Varela-Arrabal, "Microarray gene expression classification with few genes: Criteria to combine attribute selection and classification methods," Expert Systems with Applications, vol. 39, pp. 7270-7280, 2012. 4. A. Paul, J. Sil and C. D. Mukhopadhyay, "Gene selection for designing optimal fuzzy rule base classifier by estimating missing value," Applied Soft Computing, vol. 55, pp. 276-288, 2017. 5. N. Y. Moteghaed, K. Maghooli and M. Garshasbi, "Improving Classification of Cancer and Mining Biomarkers from Gene Expression Profiles Using Hybrid Optimization Algorithms and Fuzzy Support Vector Machine," Journal of medical signals and sensors, vol. 8, p. 1, 2018. 6. V. Bol{'o}n-Canedo, N. S{'a}nchez-Maro{~n}o and A. Alonso-Betanzos, "An ensemble of filters and classifiers for microarray data classification," Pattern Recognition, vol. 45, pp. 531-539, 2012.
  • 32.
  • 33.
    Microarray Technology[back]  Themost basic units for all organisms are Cells.  Each cell contains a nucleus.  In side the nucleus there are chromosomes which arranged by deoxyribonucleic acid (DNA).  The nucleotides represent the basic units of DNA which consist of sugar phosphate backbone and one of the four bases: Adenine (A), cytosine (C), guanine (G), and thymine (T) .  The hereditary information coded by the DNA through the particular order of these base pairs on a double-stranded helix for making future organisms.
  • 34.
    The Relationship Amongthe Cell, The Nucleus, a Chromosome and a Gene. Cell Nucleus where Chromosomes "live" Cell Chromosomes Contain All genetic material DNA The material from which chromosomes are constructed Gene A segment of a chromosome (made up of DNA) Microarray Technology [back]
  • 35.
    Gene Expression Profiles[back] • Amount of mRNA produced by each cell • Can be monitored to detect alteration or mutations • Each cell needs to construct protein in order to develop and function • The process of constructing proteins : • Transcription: coping the genetic information into mRNA molecules • Translation: translate mRNA into proteins’ amino acid sequences • Once the protein is constructed, gene is said to be expressed • mRNA sequences are capable to hybridize with their complementary DNA (cDNA) DNA AA CG TA AT C A CG AT T CG T transcription RNA translation Amino Acid chain folding protein
  • 36.
    DNA double- stranded helix DNA AA CG TA A T C A CG A T T C G T Microarray Technology
  • 37.
  • 38.
    The microarray dataused in the proposed system Microarray Technology …