The goal of microarray experiments is to identify genes that are differentially transcribed with respect to different
biological conditions of cell cultures and samples. Among the large amount of genes presented in gene expression
data, only a small fraction of them is effective for performing a certain diagnostic test. Hence, one of the major tasks
with the gene expression data is to find groups of co regulated genes whose collective expression is strongly
associated with the sample categories or response variables. A framework is improved/ modified in this report to
find informative gene combinations and to classify gene combinations belonging to its relevant subtype by using
fuzzy logic. The genes are ranked based on their statistical scores and highly informative genes mare filtered. Such
genes are fuzzified to identify 2-gene and 3-gene combinations and the intermediate value for each gene is
calculated to select top gene combinations to further classify gene lymphoma subtypes by using fuzzy rules. Finally
the accuracy of top gene combinations is compared with clustering results. The classification is done using the gene
combinations and it is analyzed to predict the accuracy of the results. The work is implemented using java language
MicroRNA-Disease Predictions Based On Genomic Dataijtsrd
Gene Ontology is a structured library of concepts related with one or more gene products through a process called annotation. Association Rules that discovers biologically relevant and corresponding associations. In the existing system, they used Gene Ontology-based Weighted Association Rules for extracting annotated datasets. We here adapt the MOAL algorithm to mine cross-ontology association rules. Cross ontology rules to manipulate the Protein values from three sub ontologys for identifying the gene attacked disease. It focused on intrinsic and extrinsic values. The Co-Regulatory modules between microRNA, Transcription Factor and gene on function level with multiple genomic data. The regulations are compared with the help of integration technique. Iterative Multiplicative Updating Algorithm is used in our project to solve the optimization module function for the above interactions. Comparing the regulatory modules and protein value for gene and generating Bayesian rose tree for the efficiency of our result. Ajitha. C | DivyaLakshmi. K | Jothi Jayashree. M"MicroRNA-Disease Predictions Based On Genomic Data" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-2 | Issue-3 , April 2018, URL: http://www.ijtsrd.com/papers/ijtsrd11386.pdf http://www.ijtsrd.com/computer-science/data-miining/11386/microrna-disease-predictions-based-on-genomic-data/ajitha-c
Austin Neurology & Neurosciences is an open access, peer reviewed, scholarly journal dedicated to publish articles covering all areas of Neurology & Neurological Sciences.
The journal aims to promote research communications and provide a forum for doctors, researchers, physicians and healthcare professionals to find most recent advances in all areas of Neurology & Neurological Sciences. Austin Neurology & Neurosciences accepts original research articles, reviews, mini reviews, case reports and rapid communication covering all aspects of neurology & neurosciences.
Austin Neurology & Neurosciences strongly supports the scientific up gradation and fortification in related scientific research community by enhancing access to peer reviewed scientific literary works. Austin Publishing Group also brings universally peer reviewed journals under one roof thereby promoting knowledge sharing, mutual promotion of multidisciplinary science.
Comparing Genetic Evolutionary Algorithms on Three Enzymes of HIV-1: Integras...CSCJournals
In this work, we utilized Quantitative Structure-Activity Relationship (QSAR) techniques to develop predictive models for inhibitors of the HIV-1 enzymes Integrase, HIV-Protease, and Reverse Transcriptase. Each predictive model was composed of quantitative drug characteristics that were selected by genetic evolutionary algorithms, such as Genetic Algorithm (GE), Differential Evolutionary Algorithm (DE), Binary Particle Swarm Optimization (BPSO), and Differential Evolution with Binary Particle Swarm Optimization (DE-BPSO). After characteristic selection, each model was tested with machine-learning algorithms such as Multiple Linear Regression (MLR), Support Vector Machine (SVM), and Multi-Layer Perceptron neural networks (MLP/ANN). We found that a combination of DE-BPSO combined with Multi-Layer Perceptron produced the most accurate predictive models as measured by R2, the statistical measure of proportion of variance in prediction values, and root-mean-square-error (RMSE) of prediction values compared to observed values. As for the models themselves: the best predictors for Integrase inhibitor included mass-weighted centred Broto-Moreau autocorrelation values, Moran autocorrelations, and eigenvalues of Burden matrices weighted by I-states; the best predictors for HIV-Protease inhibitors included the second Zagreb index value, the normalized spectral positive sum from Laplace matrix, and the connectivity-like index of order 0 from edge adjacency mat; and the best predictors for Reverse Transcriptase inhibitors included the number of hydrogen atoms, the molecular path count of order 7, the centred Broto-Moreau autocorrelation of lag 2 weighted by Sanderson electronegativity, the P_VSA-like on ionization potential, and the frequency of C – N bonds at topological distance 3.
Particle Swarm Optimization for Gene cluster IdentificationEditor IJCATR
The understanding of gene regulation is the most basic need for the classification of genes within a DNA. These genes
within the DNA are grouped together into clusters also known as Transcription Units. The genes are grouped into transcription units
for the purpose of construction and regulation of gene expression and synthesis of proteins. This knowledge further contributes as
essential information for the process of drug design and to determine the protein functions of newly sequenced genomes. It is possible
to use the diverse biological information across multiple genomes as an input to the classification problem. The purpose of this work is
to show that Particle Swarm Optimization may provide for more efficient classification as compared to other algorithms. To validate
the approach E.Coli complete genome is taken as the benchmark genome.
MicroRNA-Disease Predictions Based On Genomic Dataijtsrd
Gene Ontology is a structured library of concepts related with one or more gene products through a process called annotation. Association Rules that discovers biologically relevant and corresponding associations. In the existing system, they used Gene Ontology-based Weighted Association Rules for extracting annotated datasets. We here adapt the MOAL algorithm to mine cross-ontology association rules. Cross ontology rules to manipulate the Protein values from three sub ontologys for identifying the gene attacked disease. It focused on intrinsic and extrinsic values. The Co-Regulatory modules between microRNA, Transcription Factor and gene on function level with multiple genomic data. The regulations are compared with the help of integration technique. Iterative Multiplicative Updating Algorithm is used in our project to solve the optimization module function for the above interactions. Comparing the regulatory modules and protein value for gene and generating Bayesian rose tree for the efficiency of our result. Ajitha. C | DivyaLakshmi. K | Jothi Jayashree. M"MicroRNA-Disease Predictions Based On Genomic Data" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-2 | Issue-3 , April 2018, URL: http://www.ijtsrd.com/papers/ijtsrd11386.pdf http://www.ijtsrd.com/computer-science/data-miining/11386/microrna-disease-predictions-based-on-genomic-data/ajitha-c
Austin Neurology & Neurosciences is an open access, peer reviewed, scholarly journal dedicated to publish articles covering all areas of Neurology & Neurological Sciences.
The journal aims to promote research communications and provide a forum for doctors, researchers, physicians and healthcare professionals to find most recent advances in all areas of Neurology & Neurological Sciences. Austin Neurology & Neurosciences accepts original research articles, reviews, mini reviews, case reports and rapid communication covering all aspects of neurology & neurosciences.
Austin Neurology & Neurosciences strongly supports the scientific up gradation and fortification in related scientific research community by enhancing access to peer reviewed scientific literary works. Austin Publishing Group also brings universally peer reviewed journals under one roof thereby promoting knowledge sharing, mutual promotion of multidisciplinary science.
Comparing Genetic Evolutionary Algorithms on Three Enzymes of HIV-1: Integras...CSCJournals
In this work, we utilized Quantitative Structure-Activity Relationship (QSAR) techniques to develop predictive models for inhibitors of the HIV-1 enzymes Integrase, HIV-Protease, and Reverse Transcriptase. Each predictive model was composed of quantitative drug characteristics that were selected by genetic evolutionary algorithms, such as Genetic Algorithm (GE), Differential Evolutionary Algorithm (DE), Binary Particle Swarm Optimization (BPSO), and Differential Evolution with Binary Particle Swarm Optimization (DE-BPSO). After characteristic selection, each model was tested with machine-learning algorithms such as Multiple Linear Regression (MLR), Support Vector Machine (SVM), and Multi-Layer Perceptron neural networks (MLP/ANN). We found that a combination of DE-BPSO combined with Multi-Layer Perceptron produced the most accurate predictive models as measured by R2, the statistical measure of proportion of variance in prediction values, and root-mean-square-error (RMSE) of prediction values compared to observed values. As for the models themselves: the best predictors for Integrase inhibitor included mass-weighted centred Broto-Moreau autocorrelation values, Moran autocorrelations, and eigenvalues of Burden matrices weighted by I-states; the best predictors for HIV-Protease inhibitors included the second Zagreb index value, the normalized spectral positive sum from Laplace matrix, and the connectivity-like index of order 0 from edge adjacency mat; and the best predictors for Reverse Transcriptase inhibitors included the number of hydrogen atoms, the molecular path count of order 7, the centred Broto-Moreau autocorrelation of lag 2 weighted by Sanderson electronegativity, the P_VSA-like on ionization potential, and the frequency of C – N bonds at topological distance 3.
Particle Swarm Optimization for Gene cluster IdentificationEditor IJCATR
The understanding of gene regulation is the most basic need for the classification of genes within a DNA. These genes
within the DNA are grouped together into clusters also known as Transcription Units. The genes are grouped into transcription units
for the purpose of construction and regulation of gene expression and synthesis of proteins. This knowledge further contributes as
essential information for the process of drug design and to determine the protein functions of newly sequenced genomes. It is possible
to use the diverse biological information across multiple genomes as an input to the classification problem. The purpose of this work is
to show that Particle Swarm Optimization may provide for more efficient classification as compared to other algorithms. To validate
the approach E.Coli complete genome is taken as the benchmark genome.
Introduction:
Proposed by Meuwissen et al. (2001)
GS is a specialized form of MAS, in which information from genotype data on marker alleles covering the entire genome forms the basis of selection.
The effects associated with all the marker loci, irrespective of whether the effects are significant or not, covering the entire genome are estimated.
The marker effect estimates are used to calculate the genomic estimated breeding values (GEBVs) of different individuals/lines, which form the basis of selection.
Why to go for genomic selection:
Marker-assisted selection (MAS) is well-suited for handling oligogenes and quantitative trait loci (QTLs) with large effects but not for minor QTLs.
MARS attempts to take into account small effect QTLs by combining trait phenotype data with marker genotype data into a combined selection index.
Based on markers showing significant association with the trait(s) and for this reason has been criticized as inefficient
The genomic selection (GS) scheme was to rectify the deficiency of MAS and MARS schemes. The GS scheme utilizes information from genome-wide marker data whether or not their associations with the concerned trait(s) are significant.
GEBV: GenomicEstimated Breeding Values-
The sum total of effects associated with all the marker alleles present in the individual and included in the GS model applied to the population under selection
Calculated on a single individual basis
Gene-assisted genomic selection:
A GS model that uses information about prior known QTLs, the targeted QTLs were accumulated in much higher frequencies than when the standard ridge regression was used
The sum total of effects associated with all the marker alleles present in the individual and included in the GS model applied to the population under selection
Calculated on a single individual basis
Population used:
Training population: used for training of the GS model and for obtaining estimates of the marker-associated effects needed for estimation of GEBVs of individuals/lines in the breeding population.
Breeding population: the population subjected to GS for achieving the desired improvement and isolation of superior lines for use as new varieties/parents of new improved hybrids.
Training population-
large enough: must be representative of the breeding population: max. trait variance with marker : by cluster analysis
should have either equal or comparable LD, LD decay rates with breeding populations
Updated by including individuals/lines from the breeding population
Training more than one generation
Low colinearity between markers is needed since high colinearity tends to reduce prediction accuracy of certain GS models. (colinearity disturbed by recombination)
Introduction:
Proposed by Meuwissen et al. (2001)
GS is a specialized form of MAS, in which information from genotype data on marker alleles covering the entire genome forms the basis of selection.
The effects associated with all the marker loci, irrespective of whether the effects are significant or not, covering the entire genome are estimated.
The marker effect estimates are used to calculate the genomic estimated breeding values (GEBVs) of different individuals/lines, which form the basis of selection.
Why to go for genomic selection:
Marker-assisted selection (MAS) is well-suited for handling oligogenes and quantitative trait loci (QTLs) with large effects but not for minor QTLs.
MARS attempts to take into account small effect QTLs by combining trait phenotype data with marker genotype data into a combined selection index.
Based on markers showing significant association with the trait(s) and for this reason has been criticized as inefficient
The genomic selection (GS) scheme was to rectify the deficiency of MAS and MARS schemes. The GS scheme utilizes information from genome-wide marker data whether or not their associations with the concerned trait(s) are significant.
GEBV: GenomicEstimated Breeding Values-
The sum total of effects associated with all the marker alleles present in the individual and included in the GS model applied to the population under selection
Calculated on a single individual basis
Gene-assisted genomic selection:
A GS model that uses information about prior known QTLs, the targeted QTLs were accumulated in much higher frequencies than when the standard ridge regression was used
The sum total of effects associated with all the marker alleles present in the individual and included in the GS model applied to the population under selection
Calculated on a single individual basis
Population used:
Training population: used for training of the GS model and for obtaining estimates of the marker-associated effects needed for estimation of GEBVs of individuals/lines in the breeding population.
Breeding population: the population subjected to GS for achieving the desired improvement and isolation of superior lines for use as new varieties/parents of new improved hybrids.
Training population-
large enough: must be representative of the breeding population: max. trait variance with marker : by cluster analysis
should have either equal or comparable LD, LD decay rates with breeding populations
Updated by including individuals/lines from the breeding population
Training more than one generation
Low colinearity between markers is needed since high colinearity tends to reduce prediction accuracy of certain GS models. (colinearity disturbed by recombination)
The Fall Blog Festival On WiziQ
Exploring Blogs on fire, Blogging and the Beast, technology, innovation, inspired exchanges, psychology, guest blogging, vlogging.....
Is social media killing the blogosphere with kindness?
Blogging Communities You Need to Know About.....!!
Restoring our rainforests: Bonn Challenge and Forest Landscape RestorationCIFOR-ICRAF
Chetan Kumar of the Global Forest and Climate Change Program
of the International Union for Conservation of Nature (IUCN). Presented at the Asia-Pacific Rainforest Summit http://www.cifor.org/asia-pacific-rainforest-summit/
Classification of Microarray Gene Expression Data by Gene Combinations using ...IJCSEA Journal
Feature selection has attracted a huge amount of interest in both research and application communities of data mining. Among the large amount of genes presented in gene expression data, only a small fraction of them is effective for performing a certain diagnostic test. Hence, one of the major tasks with the gene expression data is to find groups of co regulated genes whose collective expression is strongly associated with the sample categories or response variables. A framework is proposed in this paper to find informative gene combinations and to classify gene combinations belonging to its relevant subtype by using fuzzy logic. The genes are ranked based on their statistical scores and highly informative genes are filtered. Such genes are fuzzified to identify 2-gene and 3-gene combinations and the intermediate value for each gene is calculated to select top gene combinations to further classify gene lymphoma subtypes by using fuzzy rules. Finally the accuracy of top gene combinations is compared with clustering results. The classification is done using the gene combinations and it is analyzed to predict the accuracy of the results. The work is implemented using java language.
RT-PCR and DNA microarray measurement of mRNA cell proliferationIJAEMSJORNAL
For mRNA quantification, RT-PCR and DNA microarrays have been compared in few studies
(RT-PCR). Healing callus of adult and juvenile rats after femur injury was found to be rich in mRNA at
various stages of the healing process. We used both methods to examine ten samples and a total of 26 genes.
Internal DNA probes tagged with 32P were employed in reverse transcription-polymerase chain reaction
(RT-PCR) to identify genes (RT-PCR). Ten Affymetrix® Rat U34A cRNA microarrays were hybridized with
biotin-labeled cRNA generated from mRNA. There was a wide range of correlation coefficients (r) between
RT-PCR and microarray data for each gene. Meaning became genetically unique because of this diversity.
Relatively lowly expressed genes had the highest r values. The distance between PCR primers and
microarray probes was found to be higher than previously assumed, leading to a drop in agreement between
microarray calls and PCR outcomes. Microarray research showed that RT-PCR expression levels for two
genes had a "floor effect." As a result, PCR primers and microarray probes that overlap in mRNA expression
levels can provide good agreement between these two techniques.
Clustering Approaches for Evaluation and Analysis on Formal Gene Expression C...rahulmonikasharma
Enormous generation of biological data and the need of analysis of that data led to the generation of the field Bioinformatics. Data mining is the stream which is used to derive, analyze the data by exploring the hidden patterns of the biological data. Though, data mining can be used in analyzing biological data such as genomic data, proteomic data here Gene Expression (GE) Data is considered for evaluation. GE is generated from Microarrays such as DNA and oligo micro arrays. The generated data is analyzed through the clustering techniques of data mining. This study deals with an implement the basic clustering approach K-Means and various clustering approaches like Hierarchal, Som, Click and basic fuzzy based clustering approach. Eventually, the comparative study of those approaches which lead to the effective approach of cluster analysis of GE.The experimental results shows that proposed algorithm achieve a higher clustering accuracy and takes less clustering time when compared with existing algorithms.
Assessment of immunomolecular_expression_and_prognostic_role_of_tlr7_among_pa...dr.Ihsan alsaimary
Dr. ihsan edan abdulkareem alsaimary
PROFESSOR IN MEDICAL MICROBIOLOGY AND MOLECULAR IMMUNOLOGY
ihsanalsaimary@gmail.com
mobile : 009647801410838
university of basrah - college of medicine - basrah -IRAQ
Now a day’s, pharma research is facing challenges in
deciphering molecular understanding of disease initiation,
progress and establishment as well as performance
assessment of drug molecule on such phases of disease
development. Emerging of next generation sequencing
bases molecular tools were found to be a key method for
creating genome wide genomics landscape of gene
mutations, gene expression and gene regulation events.
Although NGS is a powerful tool for molecular research but
same time it have its own technical challenges. Few major
challenges of NGS based pharmacogenomics is
summarized below
A Classification of Cancer Diagnostics based on Microarray Gene Expression Pr...IJTET Journal
inAbstract— Pattern Recognition (PR) plays an important role in field of Bioinformatics. PR is concerned with processing raw measurement data by a computer to arrive at a prediction that can be used to formulate a decision to be taken. The important problem in which pattern recognition are applied have common that they are too complex to model explicitly. Diverse methods of this PR are used to analyze, segment and manage the high dimensional microarray gene data for classification. PR is concerned with the development of systems that learn to solve a given problem using a set of instances, each instances represented by a number of features. The microarray expression technologies are possible to monitor the expression levels of thousands of genes simultaneously. The microarrays generated large amount of data has stimulate the development of various computational methods to different biological processes by gene expression profiling. Microarray Gene Expression Profiling (MGEP) is important in Bioinformatics, it yield various high dimensional data used in various clinical applications like cancer diagnostics and drug designing. In this work a new schema has developed for classification of unknown malignant tumors into known class. According to this work an new classification scheme includes the transformation of very high dimensional microarray data into mahalanobis space before classification. The eligibility of the proposed classification scheme has proved to 10 commonly available cancer gene datasets, this contains both the binary and multiclass data sets. To improve the performance of the classification gene selection method is applied to the datasets as a preprocessing and data extraction step.
COMPUTATIONAL METHODS FOR FUNCTIONAL ANALYSIS OF GENE EXPRESSIONcsandit
Sequencing projects arising from high throughput technologies including those of sequencing DNA microarrays allowed to simultaneously measure the expression levels of millions of genes of a biological sample as well as annotate and identify the role (function) of those genes. Consequently, to better manage and organize this significant amount of information,
bioinformatics approaches have been developed. These approaches provide a representation and a more 'relevant' integration of data in order to test and validate the hypothesis of researchers throughout the experimental cycle. In this context, this article describes and discusses some of techniques used for the functional analysis of gene expression data.
Similar to Classification of Gene Expression Data by Gene Combination using Fuzzy Logic (20)
Synthetic Fiber Construction in lab .pptxPavel ( NSTU)
Synthetic fiber production is a fascinating and complex field that blends chemistry, engineering, and environmental science. By understanding these aspects, students can gain a comprehensive view of synthetic fiber production, its impact on society and the environment, and the potential for future innovations. Synthetic fibers play a crucial role in modern society, impacting various aspects of daily life, industry, and the environment. ynthetic fibers are integral to modern life, offering a range of benefits from cost-effectiveness and versatility to innovative applications and performance characteristics. While they pose environmental challenges, ongoing research and development aim to create more sustainable and eco-friendly alternatives. Understanding the importance of synthetic fibers helps in appreciating their role in the economy, industry, and daily life, while also emphasizing the need for sustainable practices and innovation.
Operation “Blue Star” is the only event in the history of Independent India where the state went into war with its own people. Even after about 40 years it is not clear if it was culmination of states anger over people of the region, a political game of power or start of dictatorial chapter in the democratic setup.
The people of Punjab felt alienated from main stream due to denial of their just demands during a long democratic struggle since independence. As it happen all over the word, it led to militant struggle with great loss of lives of military, police and civilian personnel. Killing of Indira Gandhi and massacre of innocent Sikhs in Delhi and other India cities was also associated with this movement.
Instructions for Submissions thorugh G- Classroom.pptxJheel Barad
This presentation provides a briefing on how to upload submissions and documents in Google Classroom. It was prepared as part of an orientation for new Sainik School in-service teacher trainees. As a training officer, my goal is to ensure that you are comfortable and proficient with this essential tool for managing assignments and fostering student engagement.
Model Attribute Check Company Auto PropertyCeline George
In Odoo, the multi-company feature allows you to manage multiple companies within a single Odoo database instance. Each company can have its own configurations while still sharing common resources such as products, customers, and suppliers.
Embracing GenAI - A Strategic ImperativePeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
Classification of Gene Expression Data by Gene Combination using Fuzzy Logic
1. Vol-1 Issue-2 2015 IJARIIE-ISSN(O)-2395-4396
Classification of Gene Expression Data by Gene
Combination using Fuzzy Logic
Daxa Ghadiya1
, Prof. Ankit Kharwar2
, Prof. Monali gandhi3
1
Student of M.Tech Computer Engineering in Chhotubhai Gopalbhai Patel Institute of Technology,
Bardoli
2
Assistant Professor, Department of Computer Engineering, Chhotubhai Gopalbhai Patel Institute of
Technology, Bardoli
3
Assistant Professor, Department of Computer Engineering, Chhotubhai Gopalbhai Patel Institute of
Technology, Bardoli
ABSTRACT
The goal of microarray experiments is to identify genes that are differentially transcribed with respect to different
biological conditions of cell cultures and samples. Among the large amount of genes presented in gene expression
data, only a small fraction of them is effective for performing a certain diagnostic test. Hence, one of the major
tasks with the gene expression data is to find groups of co regulated genes whose collective expression is strongly
associated with the sample categories or response variables. A framework is improved/ modified in this report to
find informative gene combinations and to classify gene combinations belonging to its relevant subtype by using
fuzzy logic. The genes are ranked based on their statistical scores and highly informative genes mare filtered. Such
genes are fuzzified to identify 2-gene and 3-gene combinations and the intermediate value for each gene is
calculated to select top gene combinations to further classify gene lymphoma subtypes by using fuzzy rules. Finally
the accuracy of top gene combinations is compared with clustering results. The classification is done using the gene
combinations and it is analyzed to predict the accuracy of the results. The work is implemented using java
language.
Keyword: - Feature selection, classification, clustering, T-test, Cancer classification, gene expression, fuzzy,
neural networks, supports vector machines.
1. INTRODUCTION
In today’s world people are flooded by data like scientific data, medical data, financial data and marketing data.
However the Human capacity is limited and a common human cannot deal with such massive amount of data to
extract the information. The solution is to develop automatic techniques and systems – to analyse the data, to
classify the data, to summarize it, to discover and characterize trends in it.
Data mining is the extraction of interesting relations and patterns hidden in the datasets. It combines database
technologies, statistical analysis and machine learning .Today data mining techniques are innovatively utilized in
numerous fields like industry, Commerce and Medicine. [1] The data results generated by data mining techniques
are highly priced by the professionals. For Instance in this dissertation, medical data is used as a fuzzy logic,
classification algorithm.
Genes are fundamental physical and functional inheritance units of every living organism. The coding genes are
templates for synthesis of proteins. Other genes might specify RNA templates as machines for production of
different types of RNAs.
1137 www.ijariie.com 43
2. Vol-1 Issue-2 2015 IJARIIE-ISSN(O)-2395-4396
The process in which DNA is transcribed into mRNA and proteins are produced by translation represents the well-
known central dogma in molecular biology. The first stage is transcription, then the second stage – translation of
mRNA into a sequence of amino acids that compose the protein. When a protein is produced, the corresponding
coding gene is expressed.
The gene expression levels indicate the approximate number of produced RNA copies from corresponding gene,
which means that gene expression level corresponds to the amount of produced proteins. DNA microarray
technology is used to obtain gene expression data experimentally.
One of the most important regulatory functions of proteins is transcription regulation. Proteins, which bind to DNA
sequences and regulate the transcription of DNAs and gene expression, are called transcription factors (TFs). TFs
can inhibit or activate gene expression of the target genes [3]. Besides gene expression data, other data such as
protein-DNA, protein-protein interaction data and microRNAs should be considered for revealing gene regulatory
mechanisms.
2. LITERATURE SURVEY
Microarrays are capable of profiling the gene expression patterns of tens of thousands of genes in a single
experiment. Gene expression data can be a valuable source for understanding the genes and the biological
associations between them. It has high dimension, small samples and the gene selection which means that feature
selection is very important to determine the classification accuracy. The dataset utilized for this work is called
Lymphoma Dataset which includes 4026 gene expression values with its subtypes [1].
There are three types of lymphomas such as diffuse large B-cell lymphoma (DLBCL), follicular lymphoma (FL),
and chronic lymphocytic leukemia (CLL) [1]. The entire data set includes the expression data of 4,026 genes each
measured using a specialized cDNA microarray with its relevant Genbank accession number, Name and Clone IDs.
A part of the dataset is chosen for the proposed work to classify lymphoma subtypes consists of hundred genes with
gene expression values of 62 samples, with a total of 6200 samples and it is called as the Test dataset.
Table 1. A Sample data from Lymphoma Dataset
GENE ID NAME VALUES VALUES VALUES
GENE
312 9X
Anticrime motility factor
receptor Clone=1072873
-0.3000 0.3000 0.5900
GENE
312 6X
2B catalytic subunit
Clone=627173
-0.2200 -1.2100 1.4100
GENE
307 2X
APC Clone=125294 -0.0400 0.1500 0.6800
GENE
306 7X
Probable ATP
Clone=1350869
0.4100 -0.3400 -0.1800
GENE
400 6X
6XSRC-like adapter protein
Clone=701768
1.7600 1.2100 0.9900
Now, next phase is Preprocessing phase. Data pre-processing is an often neglected but important step in the data
mining process. Preprocessing is the process of removal of noisy data and filtering necessary information.
Table 2. Lymphoma Dataset with empty spots
GENE NAME VALUES VALUES VALUES VALUES
GENE
1835X
(Clone=
1357915)
-0.1300 -0.2800 0.0400
GENE
1836X
(Clone=
1358277)
-0.3100 0.1600 0.2500
GENE
1865X
(Clone=
1358064)
-0.1200 0.5200 0.8300
GENE
1933X
(Clone=
1358190)
0.0500 0.2800
GENE (Clone= -0.2600 -0.0900 0.1500
1137 www.ijariie.com 44
3. Vol-1 Issue-2 2015 IJARIIE-ISSN(O)-2395-4396
1932X 1336836)
GENE
1931X
(Clone=
1336983)
-0.5500
The empty spots are filled with nearest values as data and the preprocessed values are given as input to the next
process, called the ranking of genes.
Table 3. Preprocessed Lymphoma Dataset
GENE NAME VALUES VALUES VALUES VALUES
GENE
1835X
(Clone=1357915) -0.1300 -0.2800 -0.2800 0.0400
GENE
1836X
(Clone=1358277) -0.3100 0.1600 0.1600 0.2500
GENE
1865X
(Clone=1358064) -0.1200 0.5200 0.5200 0.8300
GENE
1933X
(Clone=1358190) 0.0500 0.0500 0.0500 0.2800
GENE
1932X
(Clone=1336836) -0.2600 -0.0900 -0.0900 0.1500
GENE
1931X
(Clone=1336983) -0.5500 -0.5500 -0.5500 -0.5500
Now further we can define ranking if gene. Gene ranking simplifies gene expression tests to include only a very
small number of genes rather than thousands of genes. The importance ranking of each gene is done using a feature
ranking measure called T-Test which ranks the genes based on their statistical score.
Table 4. List of genes with T-scores
GENEID T-SCORE
GENE1943 0.2047
GENE880 0.1842
GENE324 0.1785
GENE1557 0.1641
GENE2231 0.1598
GENE289 0.1569
GENE1792 0.1559
GENE910 0.1548
GENE272 0.1547
GENE692 0.1541
After that we can find the finding informative genes and in that it greatly reduces the computational burden and
noise arising from irrelevant genes. Gene 1 means the gene ranked first as Shown in following Table. The set of
informative genes are passed as input to the next phase for fuzzy classification.
Table 5. Informative genes based on their T-scores
GENEID T-SCORE GENE
GENE1943X 0.2047 1
GENE880X 0.1842 2
GENE324X 0.1785 3
GENE1557X 0.1641 4
1137 www.ijariie.com 45
4. Vol-1 Issue-2 2015 IJARIIE-ISSN(O)-2395-4396
GENE2231X 0.1598 5
GENE289X 0.1569 6
GENE1792X 0.1559 7
GENE910X 0.1548 8
GENE272X 0.1547 9
GENE692X 0.1541 10
Now, next phase is Fuzzy classification. In this phase the set of informative genes with gene expression data are
converted into fuzzy values using Type 1 fuzzy. The first step in fuzzification is to take the crisp inputs and covert to
fuzzy values. The second step is to take the fuzzified inputs, and apply them to the antecedents of the fuzzy rules.
The fuzzified informative genes are passed as input to the next process to identify various gene combinations.
Specifically Single gene, Two-gene and three-gene combinations are done with the selected informative genes. The
Single gene, two gene and three gene combinations are identified to classify the lymphoma subtypes such as
DBLCL, FL and CLL. The single gene is identified from the whole informative gene set which consists of 100
genes.
Now apply the intermediate value for the single gene, two gene, three gene combinations and intermediate values
are calculated for top gene combinations to frame fuzzy rules and to classify the lymphoma subtypes in the test
dataset. With using the fuzzy rules we can classify the test dataset.
Gene Combinations
According to the subtype limits Lymphoma dataset there are 62 samples for a gene, out of which 42 samples are of
DLBCL, 9 samples are FL and 11 samples are CLL. A single informative gene is used to classify subtypes in the
test dataset. A single gene GENE3 classified the subtypes of each gene in the dataset, and the count displayed under
the subtypes is the count of DLBCL’s, FL’s and CLL’s classified in the total expression values of a specific gene in
the test dataset. The gene expression values which are not classified as DLBCL, FL and CLL are classified into
other lymphoma subtypes. The single gene GENE3 classification on test dataset is shown in Table.
Table 8.Single Gene Classification Accuracy
Gene Lymphoma Subtypes
DLBCL FL CLL
GENE 50 72% 67% 5%
GENE 55 74% 34% 35%
GENE 84 72% 51% 20%
GENE 96 77% 7% 60%
In Single gene Combinations the best genes are GENE96 and GENE55 in classifying DLBCL subtype and
GENE50 attained 67% accuracy in classifying FLL subtype for Gene (GENE100X) in the test dataset which is
nearer to subtype limit 9.
Next is two gene combination classified lymphoma subtypes for several genes in the test dataset.(GENE3,GENE1),
(GENE23,GENE40), and (GENE3, GENE67) classified DLBCL subtypes of all the hundred genes within the
subtype limit. The two gene classification on test dataset is shown in Table9.
Table 9. Classification Accuracy of two gene combinations
Gene Lymphoma Subtypes
DLBCL FL CLL
(GENE3,GENE1) 62% 24% 58%
(GENE23,GENE40) 64% 36% 43%
1137 www.ijariie.com 46
5. Vol-1 Issue-2 2015 IJARIIE-ISSN(O)-2395-4396
(GENE3,GENE67) 77% 50% 57%
The best two gene combination which classified DLBCL and FLL within subtype limits is(GENE3,GENE67).
The three gene combination classified lymphoma subtypes for several genes in the test dataset. (GENE 98,GENE
89, GENE 32) classified DLBCL and FL accurately within the constraint and it attained 63% accuracy. (GENE1,
GENE2, GENE3) classified DLBCL subtypes of all the hundred genes within the subtype limit and attained 62%
accuracy. (GENE 59, GENE 71, GENE 94) attained 58% accuracy in classifying DLBCL subtype within the limit.
The three gene combinations and its accuracy are as shown in the Table 10.
Table 10. Classification Accuracy of three gene combinations
Gene Lymphoma Subtypes
DLBC
L
FL CLL
(GENE59,GENE71,GENE
94)
58% 31% 54%
(GENE98,GENE89,GENE
32)
63% 50% 80%
(GENE1,GENE2,GENE3) 62% 41% 41%
The three gene combination (GENE 98, GENE 89, and GENE 32) is the best one to classify DLBCL and FL
within the subtype limits. All other combinations can be used to classify DLBCL subtype.
From the experimental results it was found that from the top hundred genes ranked based on Tscores, single gene
selected classified 77% of DLBCL subtypes, 67% of FLL subtypes for all hundred genes in the test dataset, two
gene combinations was found to have 77% of DLBCL subtypes, 50% of FL subtypes for all hundred genes in the
test dataset and three gene combinations was found to have 63% of DLBCL subtypes, 50% of FL subtypes for all
hundred genes in the test dataset.
Finally gene combinations are verified and its correlation is compared with hierarchical clustering approach by
grouping the entire informative genes. Then the classification accuracy of the gene combination is analyzed based
on its efficiency of subtype’s classification such as DLBCL, FL and CLL of the test dataset.
3. Proposed Approach
In this project we are presenting the hybrid framework for classification of microarray gene expression data using
the gene combination with fuzzy logic and support vector machine (SVM) with goal of improving both
classification accuracy and scalability. Here we are using the fuzzy logic approach while feature selection. This is
presented to find informative gene combinations and to classify gene combinations belonging to its relevant subtype
by using fuzzy logic. The genes are ranked based on their statistical scores and highly informative genes are filtered.
Such genes are fuzzified to identify 2-gene and 3-gene combinations and the intermediate value for each gene is
calculated to select top gene combinations. Still to this we can say it as construction of feature descriptors for
classification. The final step in this project is to present the scalable classification framework in which we prepare
training and test datasets using above feature construction method with identified gene combinations. For
classification we are going to use efficient classification method SVM to claim efficiency of our proposed approach.
4. CONCLUSIONS
We can conclude that among the large amount of genes present in gene expression data, only a small fraction of
them is effective for performing classification. Such informative genes are retained by a process called feature
selection. The proposed 2-gene and 3-gene combinations are verified with a clustering approach called Hierarchical
clustering which proved that gene combination taken are good combinations in classifying lymphoma subtypes. The
classification accuracy of gene combination is verified in the final phase. From the experimental results it was found
1137 www.ijariie.com 47
6. Vol-1 Issue-2 2015 IJARIIE-ISSN(O)-2395-4396
that from the top hundred genes ranked based on T-scores, single gene selected classified 77% of DLBCL subtypes,
67% of FLL subtypes for all hundred genes in the test dataset, two gene combinations was found to have 77% of
DLBCL subtypes, 50% of FL subtypes for all hundred genes in the test dataset and three gene combinations was
found to have 63% of DLBCL subtypes, 50% of FL subtypes for all hundred genes in the test dataset. The
evolutionary approaches such as optimization methods can be used to generate best gene combinations to achieve
higher level classification accuracy.
ACKNOWLEDGEMENT
I take this opportunity to express my sincere thanks and deep sense of gratitude to my guide for imparting me
valuable guidance. They helped me by solving many doubts and suggesting many references. They helped me by
giving valuable suggestions and encouragement which not only helped me in preparing this thesis but also in having
a better insight in this field.
REFERENCES
[1] V.Bhuvaneswari,Vanitha, Classification of Microarray Gene Expression Data by Gene Combinations using
Fuzzy Logic (MGC-FL), International Journal of Computer Science, Engineering and Applications (IJCSEA)
Vol.2, No.4, August 2012.
[2] Jin-Hyuk Hong and Sung-Bee Cho, Cancer Classification with Incremental Gene Selection based on DNA
Microarray Data, Computational Intelligence in Bioinformatics and Computational Biology, 2008. CIBCB '08.
IEEE Symposium on 15-17Sept. 2008.
[3] Jin-Hyuk Hong and Sung-Bee Cho, Cancer Classification with Incremental Gene Selection based on DNA
Microarray Data, Computational Intelligence in Bioinformatics and Computational Biology, 2008. CIBCB '08.
IEEE Symposium on 15-17 Sept. 2008.
[4] Lipo Wangand Feng Chu, Extracting Very Simple Diagnostic Rules from Microarray Data, IEEE, Engineering
in Medicine and Biology Society (EMBC), 2010 Annual International Conference on Aug. 31 2010-Sept. 4
2010.
[5] Vaishali P Khobragade, Dr.A.Vinayababu, A Classification of Microarray Gene Expression Data Using Hybrid
Soft Computing Approach, International Journal of Computer Science, Engineering and Applications
(IJCSEA) ,Vol.9, No.2, November 2012.
[6] S.J.Brintha ,V. Bhuvneswari, Clustering Microarray Gene Expression Data Using Type 2 Fuzzy Logic,
International Journal of Computer Science, Engineering and Applications (IEEE), March 30-31 2012.
[7] Chhanda Ray, Cancer Identification and Gene Classification using DNA Microarray Gene Expression Patterns,
International Journal of Computer Science, Engineering and Applications (IEEE) Vol.8, Issue.2, March 2011.
[8] Lipo Wang, Feng Chu, and Wei Xie, Accurate Cancer Classification Using Expressions of Very Few Genes,
Computational Biology and Bioinformatics, IEEE/ACM
Transactions on 20 February 2007.
[9] Yu, S., Zhang, Y., Song, C., Chen, K.: "A security architecture for Mobile Ad Hoc Networks", Proceedings of
Asia-Pasic Advanced Network(APAN), 2004.
[10] UI. Partisans, A survey of models for inference of gene regulatory networks, Non-linear Analysis: Modeling
and Control Vol.18, No.4, Sep. 25 2013.
[11] Sung- Bae Cho, Hong-Hee , Won, Machine Learning in DNA Microarray Analysis for Cancer Classification.
1137 www.ijariie.com 48
7. Vol-1 Issue-2 2015 IJARIIE-ISSN(O)-2395-4396
that from the top hundred genes ranked based on T-scores, single gene selected classified 77% of DLBCL subtypes,
67% of FLL subtypes for all hundred genes in the test dataset, two gene combinations was found to have 77% of
DLBCL subtypes, 50% of FL subtypes for all hundred genes in the test dataset and three gene combinations was
found to have 63% of DLBCL subtypes, 50% of FL subtypes for all hundred genes in the test dataset. The
evolutionary approaches such as optimization methods can be used to generate best gene combinations to achieve
higher level classification accuracy.
ACKNOWLEDGEMENT
I take this opportunity to express my sincere thanks and deep sense of gratitude to my guide for imparting me
valuable guidance. They helped me by solving many doubts and suggesting many references. They helped me by
giving valuable suggestions and encouragement which not only helped me in preparing this thesis but also in having
a better insight in this field.
REFERENCES
[1] V.Bhuvaneswari,Vanitha, Classification of Microarray Gene Expression Data by Gene Combinations using
Fuzzy Logic (MGC-FL), International Journal of Computer Science, Engineering and Applications (IJCSEA)
Vol.2, No.4, August 2012.
[2] Jin-Hyuk Hong and Sung-Bee Cho, Cancer Classification with Incremental Gene Selection based on DNA
Microarray Data, Computational Intelligence in Bioinformatics and Computational Biology, 2008. CIBCB '08.
IEEE Symposium on 15-17Sept. 2008.
[3] Jin-Hyuk Hong and Sung-Bee Cho, Cancer Classification with Incremental Gene Selection based on DNA
Microarray Data, Computational Intelligence in Bioinformatics and Computational Biology, 2008. CIBCB '08.
IEEE Symposium on 15-17 Sept. 2008.
[4] Lipo Wangand Feng Chu, Extracting Very Simple Diagnostic Rules from Microarray Data, IEEE, Engineering
in Medicine and Biology Society (EMBC), 2010 Annual International Conference on Aug. 31 2010-Sept. 4
2010.
[5] Vaishali P Khobragade, Dr.A.Vinayababu, A Classification of Microarray Gene Expression Data Using Hybrid
Soft Computing Approach, International Journal of Computer Science, Engineering and Applications
(IJCSEA) ,Vol.9, No.2, November 2012.
[6] S.J.Brintha ,V. Bhuvneswari, Clustering Microarray Gene Expression Data Using Type 2 Fuzzy Logic,
International Journal of Computer Science, Engineering and Applications (IEEE), March 30-31 2012.
[7] Chhanda Ray, Cancer Identification and Gene Classification using DNA Microarray Gene Expression Patterns,
International Journal of Computer Science, Engineering and Applications (IEEE) Vol.8, Issue.2, March 2011.
[8] Lipo Wang, Feng Chu, and Wei Xie, Accurate Cancer Classification Using Expressions of Very Few Genes,
Computational Biology and Bioinformatics, IEEE/ACM
Transactions on 20 February 2007.
[9] Yu, S., Zhang, Y., Song, C., Chen, K.: "A security architecture for Mobile Ad Hoc Networks", Proceedings of
Asia-Pasic Advanced Network(APAN), 2004.
[10] UI. Partisans, A survey of models for inference of gene regulatory networks, Non-linear Analysis: Modeling
and Control Vol.18, No.4, Sep. 25 2013.
[11] Sung- Bae Cho, Hong-Hee , Won, Machine Learning in DNA Microarray Analysis for Cancer Classification.
1137 www.ijariie.com 48