This document summarizes a study that reconstructed a cancer-specific gene regulatory network for prostate cancer from gene expression profiles. The researchers identified differentially expressed genes between cancer and normal tissue samples using statistical tests. They then computed correlations between gene pairs to identify regulatory relationships, focusing on highly correlated pairs. This resulted in a network of 29 genes and 55 regulatory relationships. The network was validated against biological databases and literature, and topological analysis identified some highly connected "hub" genes that may be potential drug targets.
A Classification of Cancer Diagnostics based on Microarray Gene Expression Pr...IJTET Journal
inAbstract— Pattern Recognition (PR) plays an important role in field of Bioinformatics. PR is concerned with processing raw measurement data by a computer to arrive at a prediction that can be used to formulate a decision to be taken. The important problem in which pattern recognition are applied have common that they are too complex to model explicitly. Diverse methods of this PR are used to analyze, segment and manage the high dimensional microarray gene data for classification. PR is concerned with the development of systems that learn to solve a given problem using a set of instances, each instances represented by a number of features. The microarray expression technologies are possible to monitor the expression levels of thousands of genes simultaneously. The microarrays generated large amount of data has stimulate the development of various computational methods to different biological processes by gene expression profiling. Microarray Gene Expression Profiling (MGEP) is important in Bioinformatics, it yield various high dimensional data used in various clinical applications like cancer diagnostics and drug designing. In this work a new schema has developed for classification of unknown malignant tumors into known class. According to this work an new classification scheme includes the transformation of very high dimensional microarray data into mahalanobis space before classification. The eligibility of the proposed classification scheme has proved to 10 commonly available cancer gene datasets, this contains both the binary and multiclass data sets. To improve the performance of the classification gene selection method is applied to the datasets as a preprocessing and data extraction step.
Unravelling the molecular linkage of co morbid diseaseseSAT Journals
Abstract ABSTRACT : The incidence of Diabetes Mellitus (DM), Hypertension (HTN) and Coronary artery disease (CAD) in the country has increased alarmingly. Since decades DM and HTN have been proved to be independent risk factors for CAD. Gene and its regulatory action through a protein are vital for the normal metabolism. Any abnormality in regulation would lead to a disease. Our study used the principles of network biology to understand the comorbidity of diseases at the molecular level. We have collected disease genes of DM, HTN and CAD from various public databases and extracted genes common to all the three diseases. We constructed a biological network by considering the protein interaction data obtained from Human Protein Reference Database (HPRD).The network was validated using power law distribution and the genes were ranked using Centiscape. Finally we identified the crucial genes with literature validation which could play a major role in causing disease co-morbidity. Keywords –Biological Network, Coronary Artery Disease, Diabetes Mellitus, Hypertension and Systems Biology
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
DISCOVERING DIFFERENCES IN GENDER-RELATED SKELETAL MUSCLE AGING THROUGH THE M...ijbbjournal
Understanding gene function (GF) is still a significant challenge in system biology. Previously, several
machine learning and computational techniques have been used to understand GF. However, these previous
attempts have not produced a comprehensive interpretation of the relationship between genes and
differences in both age and gender. Although there are several thousand of genes, very few differentially
expressed genes play an active role in understanding the age and gender differences. The core aim of this
study is to uncover new biomarkers that can contribute towards distinguishing between male and female
according to the gene expression levels of skeletal muscle (SM) tissues. In our proposed multi-filter system
(MFS), genes are first sorted using three different ranking techniques (t-test, Wilcoxon and Receiver
Operating Characteristic (ROC)). Later, important genes are acquired using majority voting based on the
principle that combining multiple models can improve the generalization of the system. Experiments were
conducted on Micro Array gene expression dataset and results have indicated a significant increase in
classification accuracy when compared with existing system.
A Classification of Cancer Diagnostics based on Microarray Gene Expression Pr...IJTET Journal
inAbstract— Pattern Recognition (PR) plays an important role in field of Bioinformatics. PR is concerned with processing raw measurement data by a computer to arrive at a prediction that can be used to formulate a decision to be taken. The important problem in which pattern recognition are applied have common that they are too complex to model explicitly. Diverse methods of this PR are used to analyze, segment and manage the high dimensional microarray gene data for classification. PR is concerned with the development of systems that learn to solve a given problem using a set of instances, each instances represented by a number of features. The microarray expression technologies are possible to monitor the expression levels of thousands of genes simultaneously. The microarrays generated large amount of data has stimulate the development of various computational methods to different biological processes by gene expression profiling. Microarray Gene Expression Profiling (MGEP) is important in Bioinformatics, it yield various high dimensional data used in various clinical applications like cancer diagnostics and drug designing. In this work a new schema has developed for classification of unknown malignant tumors into known class. According to this work an new classification scheme includes the transformation of very high dimensional microarray data into mahalanobis space before classification. The eligibility of the proposed classification scheme has proved to 10 commonly available cancer gene datasets, this contains both the binary and multiclass data sets. To improve the performance of the classification gene selection method is applied to the datasets as a preprocessing and data extraction step.
Unravelling the molecular linkage of co morbid diseaseseSAT Journals
Abstract ABSTRACT : The incidence of Diabetes Mellitus (DM), Hypertension (HTN) and Coronary artery disease (CAD) in the country has increased alarmingly. Since decades DM and HTN have been proved to be independent risk factors for CAD. Gene and its regulatory action through a protein are vital for the normal metabolism. Any abnormality in regulation would lead to a disease. Our study used the principles of network biology to understand the comorbidity of diseases at the molecular level. We have collected disease genes of DM, HTN and CAD from various public databases and extracted genes common to all the three diseases. We constructed a biological network by considering the protein interaction data obtained from Human Protein Reference Database (HPRD).The network was validated using power law distribution and the genes were ranked using Centiscape. Finally we identified the crucial genes with literature validation which could play a major role in causing disease co-morbidity. Keywords –Biological Network, Coronary Artery Disease, Diabetes Mellitus, Hypertension and Systems Biology
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
DISCOVERING DIFFERENCES IN GENDER-RELATED SKELETAL MUSCLE AGING THROUGH THE M...ijbbjournal
Understanding gene function (GF) is still a significant challenge in system biology. Previously, several
machine learning and computational techniques have been used to understand GF. However, these previous
attempts have not produced a comprehensive interpretation of the relationship between genes and
differences in both age and gender. Although there are several thousand of genes, very few differentially
expressed genes play an active role in understanding the age and gender differences. The core aim of this
study is to uncover new biomarkers that can contribute towards distinguishing between male and female
according to the gene expression levels of skeletal muscle (SM) tissues. In our proposed multi-filter system
(MFS), genes are first sorted using three different ranking techniques (t-test, Wilcoxon and Receiver
Operating Characteristic (ROC)). Later, important genes are acquired using majority voting based on the
principle that combining multiple models can improve the generalization of the system. Experiments were
conducted on Micro Array gene expression dataset and results have indicated a significant increase in
classification accuracy when compared with existing system.
Clustering Approaches for Evaluation and Analysis on Formal Gene Expression C...rahulmonikasharma
Enormous generation of biological data and the need of analysis of that data led to the generation of the field Bioinformatics. Data mining is the stream which is used to derive, analyze the data by exploring the hidden patterns of the biological data. Though, data mining can be used in analyzing biological data such as genomic data, proteomic data here Gene Expression (GE) Data is considered for evaluation. GE is generated from Microarrays such as DNA and oligo micro arrays. The generated data is analyzed through the clustering techniques of data mining. This study deals with an implement the basic clustering approach K-Means and various clustering approaches like Hierarchal, Som, Click and basic fuzzy based clustering approach. Eventually, the comparative study of those approaches which lead to the effective approach of cluster analysis of GE.The experimental results shows that proposed algorithm achieve a higher clustering accuracy and takes less clustering time when compared with existing algorithms.
Comparing Genetic Evolutionary Algorithms on Three Enzymes of HIV-1: Integras...CSCJournals
In this work, we utilized Quantitative Structure-Activity Relationship (QSAR) techniques to develop predictive models for inhibitors of the HIV-1 enzymes Integrase, HIV-Protease, and Reverse Transcriptase. Each predictive model was composed of quantitative drug characteristics that were selected by genetic evolutionary algorithms, such as Genetic Algorithm (GE), Differential Evolutionary Algorithm (DE), Binary Particle Swarm Optimization (BPSO), and Differential Evolution with Binary Particle Swarm Optimization (DE-BPSO). After characteristic selection, each model was tested with machine-learning algorithms such as Multiple Linear Regression (MLR), Support Vector Machine (SVM), and Multi-Layer Perceptron neural networks (MLP/ANN). We found that a combination of DE-BPSO combined with Multi-Layer Perceptron produced the most accurate predictive models as measured by R2, the statistical measure of proportion of variance in prediction values, and root-mean-square-error (RMSE) of prediction values compared to observed values. As for the models themselves: the best predictors for Integrase inhibitor included mass-weighted centred Broto-Moreau autocorrelation values, Moran autocorrelations, and eigenvalues of Burden matrices weighted by I-states; the best predictors for HIV-Protease inhibitors included the second Zagreb index value, the normalized spectral positive sum from Laplace matrix, and the connectivity-like index of order 0 from edge adjacency mat; and the best predictors for Reverse Transcriptase inhibitors included the number of hydrogen atoms, the molecular path count of order 7, the centred Broto-Moreau autocorrelation of lag 2 weighted by Sanderson electronegativity, the P_VSA-like on ionization potential, and the frequency of C – N bonds at topological distance 3.
Presentation for Network Biology SIG 2013 by Thomas Kelder, Bioinformatics Scientist at TNO in The Netherlands. “Functional Network Signatures Link Anti-diabetic Interventions with Disease Parameters”
Integrative bioinformatics analysis of Parkinson's disease related omics dataEnrico Glaab
Presentation on statistical meta analysis of omics data from Parkinson's disease case-control studies. The results are used for a comparative analysis against aging-related omics alterations in the brain and a prioritization of new candidate disease genes using the phenologs approach.
EG-CompBio presentation about Artificial Intelligence in Bioinformatics covering:
-AI (Types, Development)
-Deep Learning (Architecture)
-Bioinformatics Fields
-Input formats for AI
-AI Challenges in Biology
-Example: (Proteomics, Transcriptomics)
-Metagenomics: @ NU
-Taxonomic Classification
-Phenotype Classification
-How to begin in AI in Bioinformatics
Application of Microarray Technology and softcomputing in cancer BiologyCSCJournals
DNA microarray technology has emerged as a boon to the scientific community in understanding the growth and development of life as well as in widening their knowledge in exploring the genetic causes of anomalies occurring in the working of the human body. microarray technology makes biologists be capable of monitoring expression of thousands of genes in a single experiment on a small chip. Extracting useful knowledge and info from these microarray has attracted the attention of many biologists and computer scientists. Knowledge engineering has revolutionalized the way in which the medical data is being looked at. Soft computing is a branch of computer science capable of analyzing complex medical data. Advances in the area of microarray –based expression analysis have led to the promise of cancer diagnosis using new molecular based approaches. Many studies and methodologies have come up which analyszes the gene espression data by using the techniques in data mining such as feature selection, classification, clustering etc. emboiding the soft computing methods for more accuracy. This review is an attempt to look at the recent advances in cancer research with DNA microarray technology , data mining and soft computing techniques.
Majority Voting Approach for the Identification of Differentially Expressed G...csandit
Understanding gene function (GF) is still a signifi
cant challenge in system biology. Previously,
several machine learning and computational techniqu
es have been used to understand GF.
However, these previous attempts have not produced
a comprehensive interpretation of the
relationship between genes and differences in both
age and gender. Although there are several
thousand of genes, very few differentially expresse
d genes play an active role in understanding
the age and gender differences. The core aim of thi
s study is to uncover new biomarkers that
can contribute towards distinguishing between male
and female according to the gene
expression levels of skeletal muscle (SM) tissues.
In our proposed multi-filter system (MFS),
genes are first sorted using three different rankin
g techniques (t-test, Wilcoxon and ROC).
Later, important genes are acquired using majority
voting based on the principle that
combining multiple models can improve the generaliz
ation of the system. Experiments were
conducted on Micro Array gene expression dataset an
d results have indicated a significant
increase in classification accuracy when compared w
ith existing system
Sample Work For Engineering Literature Review and Gap IdentificationPhD Assistance
Sample Work For Engineering Literature Review and Gap Identification - PhD Assistance - http://bit.ly/2E9fAVq
2.1 INTRODUCTION
2.2 RESEARCH GAPS IN EXISTING METHODS
2.3 OBJECTIVES OF THIS WORK
Read More : http://bit.ly/2Rl7XT5
#gapanalysis #strategicmanagement #datagapanalysis #gapanalysisppt #gapanalysishealthcare #gapanalysisfinance #gapanalysisEngineering
COMPUTATIONAL METHODS FOR FUNCTIONAL ANALYSIS OF GENE EXPRESSIONcsandit
Sequencing projects arising from high throughput technologies including those of sequencing DNA microarrays allowed to simultaneously measure the expression levels of millions of genes of a biological sample as well as annotate and identify the role (function) of those genes. Consequently, to better manage and organize this significant amount of information,
bioinformatics approaches have been developed. These approaches provide a representation and a more 'relevant' integration of data in order to test and validate the hypothesis of researchers throughout the experimental cycle. In this context, this article describes and discusses some of techniques used for the functional analysis of gene expression data.
Clustering Approaches for Evaluation and Analysis on Formal Gene Expression C...rahulmonikasharma
Enormous generation of biological data and the need of analysis of that data led to the generation of the field Bioinformatics. Data mining is the stream which is used to derive, analyze the data by exploring the hidden patterns of the biological data. Though, data mining can be used in analyzing biological data such as genomic data, proteomic data here Gene Expression (GE) Data is considered for evaluation. GE is generated from Microarrays such as DNA and oligo micro arrays. The generated data is analyzed through the clustering techniques of data mining. This study deals with an implement the basic clustering approach K-Means and various clustering approaches like Hierarchal, Som, Click and basic fuzzy based clustering approach. Eventually, the comparative study of those approaches which lead to the effective approach of cluster analysis of GE.The experimental results shows that proposed algorithm achieve a higher clustering accuracy and takes less clustering time when compared with existing algorithms.
Comparing Genetic Evolutionary Algorithms on Three Enzymes of HIV-1: Integras...CSCJournals
In this work, we utilized Quantitative Structure-Activity Relationship (QSAR) techniques to develop predictive models for inhibitors of the HIV-1 enzymes Integrase, HIV-Protease, and Reverse Transcriptase. Each predictive model was composed of quantitative drug characteristics that were selected by genetic evolutionary algorithms, such as Genetic Algorithm (GE), Differential Evolutionary Algorithm (DE), Binary Particle Swarm Optimization (BPSO), and Differential Evolution with Binary Particle Swarm Optimization (DE-BPSO). After characteristic selection, each model was tested with machine-learning algorithms such as Multiple Linear Regression (MLR), Support Vector Machine (SVM), and Multi-Layer Perceptron neural networks (MLP/ANN). We found that a combination of DE-BPSO combined with Multi-Layer Perceptron produced the most accurate predictive models as measured by R2, the statistical measure of proportion of variance in prediction values, and root-mean-square-error (RMSE) of prediction values compared to observed values. As for the models themselves: the best predictors for Integrase inhibitor included mass-weighted centred Broto-Moreau autocorrelation values, Moran autocorrelations, and eigenvalues of Burden matrices weighted by I-states; the best predictors for HIV-Protease inhibitors included the second Zagreb index value, the normalized spectral positive sum from Laplace matrix, and the connectivity-like index of order 0 from edge adjacency mat; and the best predictors for Reverse Transcriptase inhibitors included the number of hydrogen atoms, the molecular path count of order 7, the centred Broto-Moreau autocorrelation of lag 2 weighted by Sanderson electronegativity, the P_VSA-like on ionization potential, and the frequency of C – N bonds at topological distance 3.
Presentation for Network Biology SIG 2013 by Thomas Kelder, Bioinformatics Scientist at TNO in The Netherlands. “Functional Network Signatures Link Anti-diabetic Interventions with Disease Parameters”
Integrative bioinformatics analysis of Parkinson's disease related omics dataEnrico Glaab
Presentation on statistical meta analysis of omics data from Parkinson's disease case-control studies. The results are used for a comparative analysis against aging-related omics alterations in the brain and a prioritization of new candidate disease genes using the phenologs approach.
EG-CompBio presentation about Artificial Intelligence in Bioinformatics covering:
-AI (Types, Development)
-Deep Learning (Architecture)
-Bioinformatics Fields
-Input formats for AI
-AI Challenges in Biology
-Example: (Proteomics, Transcriptomics)
-Metagenomics: @ NU
-Taxonomic Classification
-Phenotype Classification
-How to begin in AI in Bioinformatics
Application of Microarray Technology and softcomputing in cancer BiologyCSCJournals
DNA microarray technology has emerged as a boon to the scientific community in understanding the growth and development of life as well as in widening their knowledge in exploring the genetic causes of anomalies occurring in the working of the human body. microarray technology makes biologists be capable of monitoring expression of thousands of genes in a single experiment on a small chip. Extracting useful knowledge and info from these microarray has attracted the attention of many biologists and computer scientists. Knowledge engineering has revolutionalized the way in which the medical data is being looked at. Soft computing is a branch of computer science capable of analyzing complex medical data. Advances in the area of microarray –based expression analysis have led to the promise of cancer diagnosis using new molecular based approaches. Many studies and methodologies have come up which analyszes the gene espression data by using the techniques in data mining such as feature selection, classification, clustering etc. emboiding the soft computing methods for more accuracy. This review is an attempt to look at the recent advances in cancer research with DNA microarray technology , data mining and soft computing techniques.
Majority Voting Approach for the Identification of Differentially Expressed G...csandit
Understanding gene function (GF) is still a signifi
cant challenge in system biology. Previously,
several machine learning and computational techniqu
es have been used to understand GF.
However, these previous attempts have not produced
a comprehensive interpretation of the
relationship between genes and differences in both
age and gender. Although there are several
thousand of genes, very few differentially expresse
d genes play an active role in understanding
the age and gender differences. The core aim of thi
s study is to uncover new biomarkers that
can contribute towards distinguishing between male
and female according to the gene
expression levels of skeletal muscle (SM) tissues.
In our proposed multi-filter system (MFS),
genes are first sorted using three different rankin
g techniques (t-test, Wilcoxon and ROC).
Later, important genes are acquired using majority
voting based on the principle that
combining multiple models can improve the generaliz
ation of the system. Experiments were
conducted on Micro Array gene expression dataset an
d results have indicated a significant
increase in classification accuracy when compared w
ith existing system
Sample Work For Engineering Literature Review and Gap IdentificationPhD Assistance
Sample Work For Engineering Literature Review and Gap Identification - PhD Assistance - http://bit.ly/2E9fAVq
2.1 INTRODUCTION
2.2 RESEARCH GAPS IN EXISTING METHODS
2.3 OBJECTIVES OF THIS WORK
Read More : http://bit.ly/2Rl7XT5
#gapanalysis #strategicmanagement #datagapanalysis #gapanalysisppt #gapanalysishealthcare #gapanalysisfinance #gapanalysisEngineering
COMPUTATIONAL METHODS FOR FUNCTIONAL ANALYSIS OF GENE EXPRESSIONcsandit
Sequencing projects arising from high throughput technologies including those of sequencing DNA microarrays allowed to simultaneously measure the expression levels of millions of genes of a biological sample as well as annotate and identify the role (function) of those genes. Consequently, to better manage and organize this significant amount of information,
bioinformatics approaches have been developed. These approaches provide a representation and a more 'relevant' integration of data in order to test and validate the hypothesis of researchers throughout the experimental cycle. In this context, this article describes and discusses some of techniques used for the functional analysis of gene expression data.
Genome-wide transcription profiling is a powerful technique in studying disease susceptible footprints. Moreover, when applied to disease tissue it may reveal quantitative and qualitative alterations in gene expression that give information on the context or underlying basis for the disease and may provide a new diagnostic approach. However, the data obtained from high-density microarrays is highly complex and poses considerable challenges in data mining. Past researches prove that neuro diseases damage the brain network interaction, protein- protein interaction and gene-gene interaction. A number of neurological research paper also analyze the relationship among damaged part. Analysis of gene-gene interaction network drawn by using state-of-the-art gene database of Alzheimer’s patient can conclude a lot of information. In this paper we used gene dataset affected with Alzheimer’s disease and normal patient’s dataset from NCBI databank. After proper processing the .CEL affymetrix data using RMA, we use the processed data to find gene interaction outputs. Then we filter the output files using probe set filtering attributes p-value and fold count and draw a gene-gene interaction network. Then we analyze the interaction network using GeneMania software.
ABSTRACT
Genome-wide transcription profiling is a powerful technique in studying disease susceptible footprints. Moreover, when applied to disease tissue it may reveal quantitative and qualitative alterations in gene expression that give information on the context or underlying basis for the disease and may provide a new diagnostic approach. However, the data obtained from high-density microarrays is highly complex and poses considerable challenges in data mining. Past researches prove that neuro diseases damage the brain network interaction, protein- protein interaction and gene-gene interaction. A number of neurological research paper also analyze the relationship among damaged part. Analysis of gene-gene interaction network drawn by using state-of-the-art gene database of Alzheimer’s patient can conclude a lot of information. In this paper we used gene dataset affected with Alzheimer’s disease and normal patient’s dataset from NCBI databank. After proper processing the .CEL affymetrix data using RMA, we use the processed data to find gene interaction outputs. Then we filter the output files using probe set filtering attributes p-value and fold count and draw a gene-gene interaction network. Then we analyze the interaction network using GeneMania software.
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...IJDKP
Over the past few years, there has been a considerable spread of microarray technology in many biological patterns, particularly in those pertaining to cancer diseases like leukemia, prostate, colon cancer, etc. The primary bottleneck that one experiences in the proper understanding of such datasets lies in their dimensionality, and thus for an efficient and effective means of studying the same, a reduction in their dimension to a large extent is deemed necessary. This study is a bid to suggesting different algorithms and approaches for the reduction of dimensionality of such microarray datasets.This study exploits the matrix-like structure of such microarray data and uses a popular technique called Non-Negative Matrix Factorization (NMF) to reduce the dimensionality, primarily in the field of biological data. Classification accuracies are then compared for these algorithms.This technique gives an accuracy of 98%.
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...IJDKP
Over the past few years, there has been a considerable spread of microarray technology in many
biological patterns, particularly in those pertaining to cancer diseases like leukemia, prostate, colon
cancer, etc. The primary bottleneck that one experiences in the proper understanding of such datasets lies
in their dimensionality, and thus for an efficient and effective means of studying the same, a reduction in
their dimension to a large extent is deemed necessary. This study is a bid to suggesting different algorithms
and approaches for the reduction of dimensionality of such microarray datasets.This study exploits the
matrix-like structure of such microarray data and uses a popular technique called Non-Negative Matrix
Factorization (NMF) to reduce the dimensionality, primarily in the field of biological data. Classification
accuracies are then compared for these algorithms.This technique gives an accuracy of 98%
Mining of Important Informative Genes and Classifier Construction for Cancer ...ijsc
Microarray is a useful technique for measuring expression data of thousands or more of genes simultaneously. One of challenges in classification of cancer using high-dimensional gene expression data is to select a minimal number of relevant genes which can maximize classification accuracy. Because of the distinct characteristics inherent to specific cancerous gene expression profiles, developing flexible and robust gene identification methods is extremely fundamental. Many gene selection methods as well as their corresponding classifiers have been proposed. In the proposed method, a single gene with high classdiscrimination capability is selected and classification rules are generated for cancer based on gene expression profiles. The method first computes importance factor of each gene of experimental cancer dataset by counting number of linguistic terms (defined in terms of different discreet quantity) with high class discrimination capability according to their depended degree of classes. Then initial important genes are selected according to high importance factor of each gene and form initial reduct. Then traditional kmeans clustering algorithm is applied on each selected gene of initial reduct and compute missclassification errors of individual genes. The final reduct is formed by selecting most important genes with respect to less miss-classification errors. Then a classifier is constructed based on decision rules induced by selected important genes (single) from training dataset to classify cancerous and non-cancerous samples of experimental test dataset. The proposed method test on four publicly available cancerous gene expression test dataset. In most of cases, accurate classifications outcomes are obtained by just using important (single) genes that are highly correlated with the pathogenesis cancer are identified. Also to prove the robustness of proposed method compares the outcomes (correctly classified instances) with some existing well known classifiers.
MINING OF IMPORTANT INFORMATIVE GENES AND CLASSIFIER CONSTRUCTION FOR CANCER ...ijsc
Microarray is a useful technique for measuring expression data of thousands or more of genes
simultaneously. One of challenges in classification of cancer using high-dimensional gene expression data
is to select a minimal number of relevant genes which can maximize classification accuracy. Because of the
distinct characteristics inherent to specific cancerous gene expression profiles, developing flexible and
robust gene identification methods is extremely fundamental. Many gene selection methods as well as their
corresponding classifiers have been proposed. In the proposed method, a single gene with high classdiscrimination
capability is selected and classification rules are generated for cancer based on gene
expression profiles. The method first computes importance factor of each gene of experimental cancer
dataset by counting number of linguistic terms (defined in terms of different discreet quantity) with high
class discrimination capability according to their depended degree of classes. Then initial important genes
are selected according to high importance factor of each gene and form initial reduct. Then traditional kmeans
clustering algorithm is applied on each selected gene of initial reduct and compute missclassification
errors of individual genes. The final reduct is formed by selecting most important genes with
respect to less miss-classification errors. Then a classifier is constructed based on decision rules induced
by selected important genes (single) from training dataset to classify cancerous and non-cancerous samples
of experimental test dataset. The proposed method test on four publicly available cancerous gene
expression test dataset. In most of cases, accurate classifications outcomes are obtained by just using
important (single) genes that are highly correlated with the pathogenesis cancer are identified. Also to
prove the robustness of proposed method compares the outcomes (correctly classified instances) with some
existing well known classifiers.
GRAPHICAL MODEL AND CLUSTERINGREGRESSION BASED METHODS FOR CAUSAL INTERACTION...ijaia
The early detection of Breast Cancer, the deadly disease that mostly affects women is extremely complex because it requires various features of the cell type. Therefore, the efficient approach to diagnosing Breast Cancer at the early stage was to apply artificial intelligence where machines are simulated with intelligence and programmed to think and act like a human. This allows machines to passively learn and find a pattern, which can be used later to detect any new changes that may occur. In general, machine learning is quite useful particularly in the medical field, which depends on complex genomic measurements such as microarray technique and would increase the accuracy and precision of results. With this technology, doctors can easily diagnose patients with cancer quickly and apply the proper treatment in a timely manner. Therefore, the goal of this paper is to address and propose a robust Breast Cancer diagnostic system using complex genomic analysis via microarray technology. The system will combine two machine learning methods, K-means cluster, and linear regression.
GRAPHICAL MODEL AND CLUSTERINGREGRESSION BASED METHODS FOR CAUSAL INTERACTION...gerogepatton
The early detection of Breast Cancer, the deadly disease that mostly affects women is extremely complex because it requires various features of the cell type. Therefore, the efficient approach to diagnosing Breast Cancer at the early stage was to apply artificial intelligence where machines are simulated with intelligence and programmed to think and act like a human. This allows machines to passively learn and find a pattern, which can be used later to detect any new changes that may occur. In general, machine learning is quite useful particularly in the medical field, which depends on complex genomic measurements such as microarray technique and would increase the accuracy and precision of results. With this technology, doctors can easily diagnose patients with cancer quickly and apply the proper treatment in a timely manner. Therefore, the goal of this paper is to address and propose a robust Breast Cancer diagnostic system using complex genomic analysis via microarray technology. The system will combine two machine learning methods, K-means cluster, and linear regression.
Graphical Model and Clustering-Regression based Methods for Causal Interactio...gerogepatton
The early detection of Breast Cancer, the deadly disease that mostly affects women is extremely complex
because it requires various features of the cell type. Therefore, the efficient approach to diagnosing Breast
Cancer at the early stage was to apply artificial intelligence where machines are simulated with
intelligence and programmed to think and act like a human. This allows machines to passively learn and
find a pattern, which can be used later to detect any new changes that may occur. In general, machine
learning is quite useful particularly in the medical field, which depends on complex genomic
measurements such as microarray technique and would increase the accuracy and precision of results.
With this technology, doctors can easily diagnose patients with cancer quickly and apply the proper
treatment in a timely manner. Therefore, the goal of this paper is to address and propose a robust Breast
Cancer diagnostic system using complex genomic analysis via microarray technology. The system will
combine two machine learning methods, K-means cluster, and linear regression.
SURVEY ON MODELLING METHODS APPLICABLE TO GENE REGULATORY NETWORKijbbjournal
Gene Regulatory Network (GRN) plays an important role in knowing insight of cellular life cycle. It gives
information about at which different environmental conditions genes of particular interest get over
expressed or under expressed. Modelling of GRN is nothing but finding interactive relationships between
genes. Interaction can be positive or negative. For inference of GRN, time series data provided by
Microarray technology is used. Key factors to be considered while constructing GRN are scalability,
robustness, reliability and maximum detection of true positive interactions between genes. This paper
gives detailed technical review of existing methods applied for building of GRN along with scope for
future work.
Technology R&D Theme 2: From Descriptive to Predictive NetworksAlexander Pico
National Resource for Networks Biology's TR&D Theme 2: Genomics is mapping complex data about human biology and promises major medical advances. However, the routine use of genomics data in medical research is in its infancy, due mainly to the challenges of working with highly complex “big data”. In this theme, we will use network information to help organize, analyze and integrate these data into models that can be used to make clinically relevant diagnoses and predictions about an individual.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Leading Change strategies and insights for effective change management pdf 1.pdf
Reconstruction and analysis of cancerspecific Gene regulatory networks from Gene expression profiles
1. International Journal on Bioinformatics & Biosciences (IJBB) Vol.3, No.2, June 2013
DOI: 10.5121/ijbb.2013.3203 27
RECONSTRUCTION AND ANALYSIS OF CANCER-
SPECIFIC GENE REGULATORY NETWORKS FROM
GENE EXPRESSION PROFILES
Khalid Raza1*
and Rajni Jaiswal2
1
Department of Computer Science, Jamia Millia Islamia (Central University),
New Delhi-110025, India.
kraza@jmi.ac.in
2
Department of Computer Science, Jamia Hamdard, New Delhi-110062, India.
ABSTRACT
The main goal of Systems Biology research is to reconstruct biological networks for its topological
analysis so that reconstructed networks can be used for the identification of various kinds of disease. The
availability of high-throughput data generated by microarray experiments fuelled researchers to use
whole-genome gene expression profiles to understand cancer and to reconstruct key cancer-specific gene
regulatory network. Now, the researchers are taking a keen interest in the development of algorithm for
the reconstruction of gene regulatory network from whole genome expression profiles. In this study, a
cancer-specific gene regulatory network (prostate cancer) has been constructed using a simple and novel
statistics based approach. First, significant genes differentially expressing them self in the disease
condition has been identified using a two-stage filtering approach t-test and fold-change measure. Next,
regulatory relationships between the identified genes has been computed using Pearson correlation
coefficient. The obtained results has been validated with the available databases and literatures. We
obtained a cancer-specific regulatory network of 29 genes with a total of 55 regulatory relations in which
some of the genes has been identified as hub genes that can act as drug target for the cancer diagnosis.
KEYWORDS
Gene regulatory network, microarray analysis, prostate cancer, differentially expressed genes
1. INTRODUCTION
Microarray technology allows researchers to measure the expressions of large numbers
(thousands) of genes simultaneously. In human body, all cells contain same genetic material but
the same genes may or may not be active. This variation in the activation of genes assists
researchers to understand more about the function of the cells. Microarray technology helps
researchers to get insight about many different diseases such as various cancer disease, heart
disease, mental illness, and infectious disease, etc. [1]. Gene regulation refers to processes in
which cells are used to create functional gene products (such as RNA, proteins) from the
information stored in genes (DNA). Gene expression data is used widely for the analysis of
disease and its diagnosis. Microarray gene expression data is playing a significant role in cancer
predication and diagnosis. These data can be characterized by many variables (genes) which are
measured on only a few observations (experiments) due to experimental limitations [1]. This
provides great opportunities to explore large scale regulatory networks for various purposes such
as to identify specific genes causing particular disease so that researchers can target those genes
to understand interactions among transcription factors and drug targets, to understand
metabolism, and so on [2].
2. International Journal on Bioinformatics & Biosciences (IJBB) Vol.3, No.2, June 2013
28
Gene regulatory networks (GRNs) are the systematic biological networks that describe
interactions among genes in the form of a graph, where node represents genes and edges their
regulatory interactions. Understanding the GRNs helps in understanding interactions among
genes, biological and environmental effects and to identify the target genes for drug against the
diseases [3]. GRNs have been proved to be a very useful tool used to describe and explain
complex dependencies between key developmental transcription factors (TFs), their target genes
and regulators [4][5]. Reconstruction of GRN is the development of network model from the
available datasets. The GRN reconstruction explicitly represents the developmental or regulatory
process, which is of great interest today. Reconstruction has become a challenging computational
problem for researchers to understand complex regulatory mechanisms in biological systems.
Although, every methods for inferring GRNs from microarray gene expression profiles have both
strengths and weaknesses. In this study, we constructed and analysed prostate cancer GRN.
Prostate cancer is a slow growing cancer that develops in the prostate and it can spread to other
parts of the body such as bones and lymph nodes. It has been reported that prostate cancer is the
second leading cause of cancer-related death in United States [6] and sixth in the world [7]. This
cancer is most common in developed countries with growing rates in development countries.
Monitoring of gene expression from microarray is considered to be one of the most promising
techniques for the discovery of GRNs. This technique making GRNs feasible. However,
inferring GRNs from time series microarray gene expression involves following challenges i)
number of related genes is very large compared to the number of samples or time points, ii)
observed data involves a significant amount of noise, and iii) gene interactions displays
complex (nonlinear and dynamic) relationships [2,3,8,9].
2. RELATED WORKS
The gene regulatory network models can be used to enhance the understanding of gene
interactions and explicate the environmental and drug effects. Gene regulatory networks models
can mainly be categorized into two types that use discrete and continuous variables [5,10]. The
models that use discrete types of variables assume that genes exist in discrete state only. Boolean
variables implement these types of approximation in which genes are either in active state (1) or
inactive state (0). The Boolean networks are not realistic because some information loss occurs
during discretization [11,5]. Bayesian models implements the discretization of variables. These
models estimate the probability relationship between genes in the network. The structure of these
types of GRNs is modeled by a directed acyclic graph (DAG) in which the expression level, its
conditional dependencies on parent and it’s probability distribution of particular gene are
estimated. These networks are unsuitable for handling time series gene expression or temporal
information [5,11,12,13]. The models that uses continuous variables and got most popular are
based on ordinary differential equations (ODEs). It models the concentrations of RNAs, proteins,
and other molecules with nonnegative real number values of variable. The disadvantage of
numerical techniques is the lacking the measurements of the kinetic parameters in the rate
equations [5,13]. Most of the earlier work on reconstruction of GRN has been done on smaller
organisms having small genome. A few attempt has been made to reconstruction GRN of human
related disease. Wang and Gotoh [27] inferred directed cancer-specific GRN using soft
computing rules from microarray data. They studied Colon Cancer datasets consisting of 2000
genes and 62 samples and analyzed 18 annotated genes only. Basso et al. [28] report the
reconstruction of gene regulatory networks from gene expression profiles of human B cells. The
results show a scale-free network, where a few hub genes were identified. Jiang et al [29]
identified multiple disease pathways in which genes extracted by supervised learning of the
genome-wide transcriptional profiles for patients and normal samples. A pair-wise relevance
metric, adjusted frequency value, was applied to describe the degree of genetic relationship
between two molecular determinants. The methodology was applied to analyze microarray
dataset of colon cancer and results demonstrate that the Colon Cancer-specific gene network
captures the most central genetic interactions. The topological analysis of inferred network
3. International Journal on Bioinformatics & Biosciences (IJBB) Vol.3, No.2, June 2013
29
shows three known hub cancer genes. An extensive review on GRN modelling can be found in
[8, 4, 11, 14, 13, 10]. In the present study, we tried to identify hub (highly connected) genes and
analyze the topological behaviour of the constructed network in prostate cancer datasets.
3. MATERIALS AND METHODS
To find regulatory relationship between gene pairs using gene expression profile, many
techniques have been used in the literature. In this work, Pearson's correlation coefficient has
been applied. The main steps of the proposed algorithm are outlined as follows.
(1) Preprocessing of the dataset
(2) Identification of most significant genes
(3) Finding regulatory relationship between gene pairs
(4) Elimination of weak correlation
(5) Visualization of the network
(6) Biological validation
(7) Topological analysis
2.1. Preprocessing of Datasets
The gene expression data are mostly present in normalized form. The normalized data for each
gene are typically known as an ‘expression ratio’ or as the logarithm of the expression ratio. The
expression ratio for a particular gene is basically the normalized value of the expression level
which is the ratio of query sample and its normalized value for the control. The datasets under
consideration are in normalized form. We did data preprocessing to handle missing values,
duplicate and missing gene names, etc in the datasets.
2.2. Identification of Most Significant Genes
In this step, those genes are identified that are differentially expressing themselves in diseased
condition. A two-stage filtering strategy has been applied in this paper. At the first stage,
statistical measure t-test has been applied. The t-test for unpaired data and both for equal and
unequal variance can be computed as,
2
2
1
2
n
h
n
g
xy
t
ii
ii
i
+
−
= (1)
where xi and yi are the means, gi and hi are the variances, and n1 and n2 are the sizes of the two
groups of the samples (conditions) tissue and cultured , respectively, of gene expression profile i.
At the second stage, a fold-change strategy has been applied. A fold change is a measure that
describes how much expression level of a gene changes over two different samples (conditions)
or groups. The fold change (FC) for linear data can be calculated as,
ii
i
i
i xLogyLogor
x
y
LogFC 222 −= (2)
where, xi and yi are mean of gene expression profile i in tissue and cultured cases, respectively. In
case, gene expression data is already in Log2 transformed form, fold change can be computed as
[15],
4. International Journal on Bioinformatics & Biosciences (IJBB) Vol.3, No.2, June 2013
30
ii
i
i
i xyor
x
y
FC −= (3)
2.3. Finding Regulatory Relationship Between Gene Pairs
We applied Pearson correlation coefficient rxy to find out regulatory relationship between gene
pairs x and y. The correlation is +1 if there is a perfect positive linear relationship, −1 if there is a
perfect negative linear relationship and values between −1 and 1 indicates the degree of linear
dependence between the variables. Closer the coefficient to either −1 or +1, stronger the
correlation between the variables. If the coefficient is zero, the variables are independent. If we
have n samples (conditions) of x and y genes, written as xi and yi where i = 1, 2, ..., n, the
correlation coefficient between x and y (rxy) can be estimated as,
( )( ) ( )( )∑ ∑∑∑
∑ ∑ ∑
−−
−
=
2222
iiii
iiii
xy
yynxxn
yxyxn
r (4)
Once, the pair-wise correlation coefficient between genes are computed, next we select those
coefficient having absolute values above a threshold and eliminated weakly correlated gene
pairs. This strategy allows to focus on a few highly connected genes. In this study, we observed
that only few genes are strongly correlated, mostly positively and few negatively. The pair-wise
correlation among rest of the gene pairs are pretty week.
4. RESULTS AND DISCUSSIONS
In this study, microarray data of prostate cancer has been taken for network construction and its
topological analysis [30]. The dataset (the full dataset can be downloaded from GEO
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE26126) consists of 27575 genes,
having 181 tissue and 12 cultured samples. To find out the most significant genes, a two-stage
filtering has been applied. In the first stage, the statistical test two-tailed t-test is applied to find
out the significant genes and considered only those genes having p-value<=0.001 as significant
genes and extracted it for the analysis. Out of the 27575 genes from the dataset, 9985 genes have
been extracted on the basis of p-value, which is approximately 36% of total number of genes.
The formula to calculate t-statistic for unpaired data and both for equal and unequal variance is
given in equation (1). In the next stage, we applied a fold-change measure to evaluate the
changes in expression level of each gene. The fold change for two kind of data can be calculated
using equation (3). At this stage, we considered only those genes showing a minimum of five-
fold change in expression-level and finally 101 genes has been selected, which is 0.01 % of 9985
genes (extracted at first stage) and 0.003% of total 27575 genes.
Further, Pearson correlation has been applied and observed the pair-wise correlation among
extracted 101 genes. The Pearson correlation can be calculated using equation (4). The week
correlation between gene pairs has been dropped. The correlation absolute value which are
>=0.85 has been considered as strong correlation and thus, 55 regulatory relationship has been
identified which involves 29 genes only. This strategy again reduced the noise level of data up to
0.001% of total of 27575 genes.
Finally, 29 extracted genes has been validated with available biological databases and literatures.
Most of the genes among 29 are somehow involved in prostate cancer. The Table 1 shows
validation of individual genes and their family from various biological databases and literature.
The Table 2 shows the interaction of gene pairs and representing either the relation is activation
(+) or repressing (−) .The positive (+) correlation shows activation and negative value (−) shows
5. International Journal on Bioinformatics & Biosciences (IJBB) Vol.3, No.2, June 2013
31
repressing (inhibiting). Out of 55 extracted regulatory relations, 52 are activators and only 3 are
inhibitors.
Table 1. List of genes found to be involved in prostate cancer.
Genes Brief description Reference(s)
GAS6 family, MYO3A overexpressed in androgen-independent
compared to androgen-dependent prostate
cancer cells
A P. Singh, et al. (2008)
[16]
S100A16 family,
PSMD1, RND2, PLAT,
PPP3R1family
under-expressed in androgen-independent
compared to androgen-dependent prostate
cancer cells
A P. Singh, et al. (2008)
[16]
KCNS2, SLC17A8
family, COL17A1
family
Potential secretory biomarkers for selenium
action in prostate cancer
Hongjuan Zhao (2004)
[17]
PHLDA1 Changes in Expression in benign prostatic
in benign prostatic hyperplasia (BPH)
Hongjuan Zhao, et al
(2004) [17]
KRT5 marker of basal cells in prostate glands Uma R Chandran, et al.
(2007) [18]
CA9 gene of C4-2 prostate cancer cell line which
is being expressed
Asa J Oudes, et al.
(2005) [19]
SLAMF9 SLAMF9 is subfamily of CD2 which in
association with EWI subfamily to
inversely correlates with metastasis
potential of prostate cancer
Xin A. Zhang, et al.
(2003) [20]
AQP10 expression and cellular localization of the
AQPs were determined in the human
prostate cancer
Insang Hwang, et al.
(2012) [21]
BNIP3 overexpressed in various tumors, including
prostate cancer.
Xueqin Chen, et al.
(2010) [22]
ZFAND2B Zinc finger, AN1-type domain 2B
expressed in prostate cancer and many more
tissues
G2SBC Database [23]
SRPX2 the nucleic acid that encodes a SRPX2 can
be act as target for cancer like prostate
cancer.
IMHOF, et al. (2007)
[24]
CSTF1 186-gene “invasiveness” gene signature
(IGS) including CSTF1 are not only
associated with only breast cancer but also
in many cancer cells such as prostate
cancer, etc.
Rui Liu, (2007) [25]
KCNE2 KCNQ1 form complex with KCNE2 family
and regualtes in prostate cancer
NCBI [26]
At the next step the network has been constructed using Cytoscape software tool which is shown
in Fig. 1. From the constructed network in Fig. 1, we can easily identify that genes KRT5,
BNIP3, GJB5 and KCNE2 are participating as hub genes with former three having total degree
(indegree and outdegree) of eight, and later having total degree of six. GJB5 is activating seven
other genes SLAMF9, PAK6, COL17A1, HCAR2, C8A, S100A16 and KRT5, and GJB5 is
activated by other hub gene KCNE2. Similarly, gene KCNE2 activates six other genes GJB5,
SLAMF9, COL17A1, HCAR2, PAK6 and KRT5. The gene CSTF1 does not activate any other
gene rather it is activated by EMP1 and inhibited by two other genes SRPX2 and hypothetical
LOC401459. The hypothetical LOC401459 inhibits only CSTF1, activated by SRPX2 and
inhibited by EMP1. There are many genes that do not regulate (either activate or inhibit) any
other genes in the network such as PSMD1, CSTF1, ZFAND2B, BNIP3, PLAT, C8A and
6. International Journal on Bioinformatics & Biosciences (IJBB) Vol.3, No.2, June 2013
32
HCAR2. From the Fig. 1 it is clear that gene BNIP3 is activated by large number of genes and
hence it will be overexpressed. From the literature [22] it has been proved that gene BNIP3 is
overexpressed in various tumors including prostate. The other identified hub gene KRT5 is a
marker of basal cells in prostate glands and shows uniform downregulation in all metastatic
tumors [18]. From the literature [26], it has been validated that KCNQ1 form complex with
KCNE2 (also one of the identified hub gene) family and regulates in prostate cancer.
Table 2. List of genes found to be involved in prostate cancer.
Source Target
Activate (+)
Repress (−)
GJB5
SLAMF9 +
PAK6 +
COL17A1 +
HCAR2 +
C8A +
S100A16 +
KRT5 +
KCNE2
GJB5 +
SLAMF9 +
COL17A1 +
HCAR2 +
PAK6 +
KRT5 +
S100A16
PAK6 +
PHLDA1 +
C8A +
PLAT +
ZNF577
RND2 +
SLC17A8 +
GAS6 +
BNIP3 +
GAS6
RND2 +
BNIP3 +
SLC17A8 +
PAK6 +
KRT5
PAK6 +
BNIP3 +
ST18 +
COL17A1
BNIP3 +
C8A +
HCAR2 +
AQP10
SLC17A8 +
ST18 +
RND2 +
EMP1
SRPX2 +
CSTF1 +
hypothetical LOC401459 −
PPP3R1
ZFAND2B +
YWHAH +
CA9
S100A16 +
AQP10 +
SRPX2
hypothetical LOC401459 +
CSTF1 −
SLC17A8
RND2 +
BNIP3 +
7. International Journal on Bioinformatics & Biosciences (IJBB) Vol.3, No.2, June 2013
33
PAK6
BNIP3 +
RND2 +
RND2 BNIP3 +
PHLDA1 PLAT +
YWHAH PSMD1 +
KCNS2 SLC17A8 +
SLAMF9 HCAR2 +
MYO3A RND2 +
C8A PAK6 +
hypothetical LOC401459 CSTF1 −
Figure 1. Inferred gene regulatory network of 29 genes and 55 regulatory relations using proposed
methodology. The finding shows that genes KRT5, BNIP3, GJB5 and KCNE2 are participating as hub
genes.
8. International Journal on Bioinformatics & Biosciences (IJBB) Vol.3, No.2, June 2013
34
5. CONCLUSIONS
The complex molecular interactions underlying cancer is due to the perturbations in the gene
regulatory networks. Therefore, identification of cancerous genes, pathways control by them
through gene regulatory networks is a key step towards cancer diagnosis. A directed regulatory
network is proficient to reveal interactions among genes more legitimately and also capable to
capture cause-effect relations between genes-pairs. This paper reports a simple statistical
approach to extract differentially expressed genes, finding correlations between gene-pairs for
the reconstruction of gene regulatory networks under specific disease conditions that assist the
interpretability of the network. First, genes relevant to a specific cancer using a t-test and fold-
change method has been identified. The pair-wise correlation coefficients among gene pairs were
calculated and a threshold value has been imposed to eliminate weakly correlated gene pairs and
found 55 significantly correlated gene pairs that involves 29 genes. A regulatory network has
been constructed using Cytoscape software tool. During the analysis of the constructed network
we observed that some genes are working as hub genes including KRT5, BNIP3, GJB5 and
KCNE2. Among them, BNIP3 is highly activated (overexpressed) gene which has been proved
to be overexpressed in prostate cancer [22]. The other hub gene KRT5 is a marker of basal cells
in prostate glands and shows uniform downregulation in all metastatic tumors [18]. The result
shows that gene KCNE2 regulate large number of genes which can be validated with [26] that it
regulates in prostate cancer.
The regulatory relationships among genes in cancer are not freely accessible from database and
available in literature. Due to this problem, the construction of gene regulatory networks and
their validation in a realistic manner is really a difficult task. The utility and reliability of our
study needs further experimental validation. Our finding can help to reveal common molecular
interactions in the cancer under study and provide new insights in cancer diagnostics, prognostics
and therapy. Our proposed approach can also be used to investigate other disease specific gene
regulatory network like colon cancer, lung cancer, breast cancer and so on. In future study, we
will try to construct regulatory networks for other types of cancer from microarray data.
Microarray data are inherently noisy due to experimental limitations. Noises in the dataset
directly reflects the statistical techniques. Today, artificial intelligence based approach such as
fuzzy logic, neural networks, evolutionary computation are being used in many bioinformatics
research problems. The promises of fuzzy logic to tolerate noise and deal with impression, neural
network to learn from data rich environment and evolutionary computation for the optimization
can be good candidate to infer gene regulatory network from microarray data. In the future, we
can apply these artificial intelligence based sophisticated techniques to better construct cancer-
specific regulatory networks.
ACKNOWLEDGEMENTS
The authors would like to thank all scientists behind the publicly available data sets. The author
K. Raza acknowledges the funding from University Grants Commission, Govt. of India through
research grant 42-1019/2013(SR). The co-author R. Jaiswal acknowledges the Department of
Computer Science, Jamia Millia Islamia, New Delhi, India for providing necessary facilities to
carry out this research.
REFERENCES
[1] K.Vaishali & A.Vinayababu, (2011). "Application of microarray technology and softcomputing
in cancer biology : a review", International Journal of Biometrics and Bioinformatics (IJBB), vol.
5, no. 4. pp. 225-233.
[2] Jeffrey D. Allen, et al., (2012). "Comparing statistical methods for constructing large scale gene
networks", PLoS ONE, vol. 7, no. 1, pp. e29348.
9. International Journal on Bioinformatics & Biosciences (IJBB) Vol.3, No.2, June 2013
35
[3] Rui Xu, et al., (2007). "Inference of genetic regulatory networks with recurrent neural network
models using particle swarm optimization", IEEE/ACM Transactions on Computational Biology
and Bioinformatics, vol. 4, no. 4, pp. 681-692.
[4] K. Raza & R. Parveen, (2012). "Evolutionary algorithm in genetic regulatory networks model",
Journal of Advanced Bioinformatics Applications and Research, vol. 3, no. 1, pp. 271–280.
[5] W-Po Lee & K-C Yang, (2008). "A clustering-based approach for inferring recurrent neural
networks as gene regulatory networks", Neurocomputing, vol. 71, no. 4, pp. 600-610.
[6] R. Siegel, (2011). "Cancer statistics, 2011: the impact of eliminating socioeconomic and racial
disparities on premature cancer deaths", CA Cancer J Clin, vol. 61, pp. 212–236.
[7] PD Baade, DR Youlden & LJ Krnjacki, (2009). "International epidemiology of prostate cancer:
geographical distribution and secular trends", Molecular nutrition and food research, vol. 53, no.
2, pp. 171–184.
[8] K. Raza & R. Parveen, (2012). "Soft computing approach for modeling genetic regulatory
networks", Advances in Computing and Information Technology, vol. 178, pp. 1-12.
[9] H. W. Ressom, et al., (2006). "Inference of gene regulatory networks from time course gene
expression data using neural networks and swarm intelligence", In Proceeding of IEEE
Symposium on Computational Intelligence and Bioinformatics and Computational Biology, pp. 1-
8.
[10] G. Karlebach & R. Shamir, (2008). "Modelling and analysis of gene regulatory networks", Nature
Reviews Molecular Cell Biology, vol. 9, pp. 770-780.
[11] S. Mitra, et al., (2011). "Genetic networks and soft computing", IEEE/ACM Transactions on
Computational Biology and Bioinformatics, vol. 8, no. 1.
[12] K. Raza & A. Mishra, (2012). "A novel anticlustering filtering algorithm for the prediction of
genes as a drug target", American Journal of Biomedical Engineering, vol. 2, issue 5, pp. 206–
211.
[13] T. Martin, et al., (2010). "Comparative study of three commonly used continuous deterministic
methods for modeling gene regulation networks", BMC Bioinformatics, vol. 11, pp. 459.
[14] H. D. Jong, (2002). "Modeling and simulation of genetic regulatory systems: a literature review",
Journal of Computation Biology, vol. 9, issue 1, pp. 67-103.
[15] V. Farztdinov & F. McDyer, (2012). "Distributional fold change test – a statistical approach for
detecting differential expression in microarray experiments", Algorithms for Molecular
Biology, vol. 7, no. 1, pp. 29.
[16] AP Singh, et al., (2008). "Genome-wide expression profiling reveals transcriptomic variation and
perturbed gene networks in androgen-dependent and androgen-independent prostate cancer cells",
Cancer Lett. vol. 259, issue 1, pp. 28–38.
[17] H. Zhao, et al., (2004). "Diverse effects of methylseleninic acid on the transcriptional program of
human prostate cancer cells", The American Society for Cell Biology, Molecular biology of the
cell, vol. 15, issue 2, pp. 506-519.
[18] U. R. Chandran, et al., (2007). "Gene expression profiles of prostate cancer reveal involvement of
multiple molecular pathways in the metastatic process", BMC Cancer, vol. 7, no. 64.
[19] A.J. Oudes, et al., (2005). "Application of affymetrix array and massively parallel signature
sequencing for identification of genes involved in prostate cancer progression", BMC Cancer, vol.
5, issue 86.
[20] X.A. Zhang, et al., (2003). "EWI2/PGRL associates with the metastasis suppressor KAI1/CD82
and inhibits the migration of prostate cancer cells", Cancer Research, vol. 63, pp. 2665–2674.
[21] I. Hwang, et al., (2012). "Expression and localization of aquaporins in Benign prostate
Hyperplasia and prostate cancer", Chonnam Medical Journal, vol. 48, issue 3, pp. 174-178.
10. International Journal on Bioinformatics & Biosciences (IJBB) Vol.3, No.2, June 2013
36
[22] X. Chen, et al., (2010). "MicroRNA145 targets BNIP3 and suppresses prostate cancer
progression", Cancer Research, vol. 70, issue 7, pp. 2728–38.
[23] G2SBC- genes to systems breast cancer database
(http://www.itb.cnr.it/breastcancer/php/geneReport.php?id=130617#)
[24] IMHOF, et al., (2009). "Modulation of SRPX2- mediated angiogenesis", WIPO Patent
2009111444, issued September 12, 2009.
[25] R. Liu, (2007). "The Prognostic role of a gene signature from Tumorigenic breast-cancer cells",
Massachusetts Medical Society.
[26] www.ncbi.nlm.nih.gov/IEB/Research/Acembly/av.cgi?db=human&c=Gene&l=KCNQ1.
[27] X. Wang & O. Gotoh, (2010). "Inference of Cancer-specific gene regulatory networks using soft
computing rules", Gene Regulation and Systems Biology, vol. 4, pp. 19–34.
[28] K. Basso, AA Margolin, et al., (2005). "Reverse engineering of regulatory networks in human B
cells", Nature Genetics, vol. 37, pp. 382-390.
[29] W. Jiang, X. Li, et al., (2008). "Constructing disease-specific gene networks using pair-wise
relevance metric: Application to colon cancer identifies interleukin 8, desmin and enolase 1 as the
central elements", BMC Systems Biology, vol. 2, no. 72.
[30] Y. Kobayashi, et al., (2011). "DNA methylation profiling reveals novel biomarkers and important
roles for DNA methyltransferases in prostate cancer", Genome Res, vol. 21, issue 7, pp. 1017-27.