Meta-docking is a Bayesian mixture model for improving protein-ligand interaction predictions from multiple docking scores. It accounts for differences in score distributions between active and decoy ligands arising from docking program and ligand effects. Compared to standard consensus docking, meta-docking shows small but consistent improvements in ranking ligands, detecting more active ligands among top ranks. Future work includes investigating inter-ligand features from multiple programs to further improve ligand ranking.
Stable Drug Designing by Minimizing Drug Protein Interaction Energy Using PSO csandit
1. The document proposes using a particle swarm optimization (PSO) algorithm to design stable drug molecules that minimize interaction energy with target proteins.
2. In the algorithm, drugs are represented as variable-length trees containing functional groups, and PSO is used to optimize van der Waals and electrostatic interaction energies.
3. Results show that PSO performs better than previous fixed-length tree methods at designing drugs that stably bind to active sites of human rhinovirus, malaria, and HIV proteins.
Each and every biological function in living organism occurs due to protein-protein interactions. The
diseases are no exception to this. Identifying one or more proteins for a particular disease and then
designing a suitable chemical compound (which is known as drug or ligand) to destroy those proteins is a
challenging topic of research in computational biology. In earlier methods, drugs were designed using only
a few chemical components and were represented as a fixed-length tree. But in reality, a drug contains
many chemical groups collectively known as pharmacophore. Moreover, the chemical length of the drug
cannot be determined before designing that drug.
In the present work, a Particle Swarm Optimization (PSO) based methodology has been proposed to find
out a suitable drug for a particular disease so that the drug-target protein interaction energy becomes
minimum. In the proposed algorithm, the drug is represented as a variable length tree and essential
functional groups are arranged in different positions of that drug. Finally, the structure of the drug is
obtained and its docking energy is minimized simultaneously. Also, the orientation of chemical groups in
the drug is tested so that it can bind to a particular active site of a target protein and the drug fits well
inside the active site of target protein. Here, several inter-molecular forces have been considered for
accuracy of the docking energy. Results are demonstrated for three different target proteins both
numerically and pictorially. Results show that PSO performs better than the earlier methods.
Multi-Objective Optimization for Clustering of Medical PublicationsDiego Molla-Aliod
A. Ekbal, S. Saha, D. Mollá, and K. Ravikumar.
Multi-Objective Optimization for Clustering of Medical
Publications (2013). Proceedings of the Australasian
Language Technology Association Workshop 2013
(ALTA 2013),
pp53-61, Brisbane, Australia. http://aclweb.org/anthology/U/U13/
GPCODON ALIGNMENT: A GLOBAL PAIRWISE CODON BASED SEQUENCE ALIGNMENT APPROACHijdms
The alignment of two DNA sequences is a basic step in the analysis of biological data. Sequencing a long
DNA sequence is one of the most interesting problems in bioinformatics. Several techniques have been
developed to solve this sequence alignment problem like dynamic programming and heuristic algorithms.
In this paper, we introduce (GPCodon alignment) a pairwise DNA-DNA method for global sequence
alignment that improves the accuracy of pairwise sequence alignment. We use a new scoring matrix to
produce the final alignment called the empirical codon substitution matrix. Using this matrix in our
technique enabled the discovery of new relationships between sequences that could not be discovered using
traditional matrices. In addition, we present experimental results that show the performance of the
proposed technique over eleven datasets of average length of 2967 bps. We compared the efficiency and
accuracy of our techniques against a comparable tool called “Pairwise Align Codons” [1].
This document discusses issues with commonly used ligand efficiency metrics. It argues that ligand efficiency metrics make unrealistic assumptions by normalizing potency based on trends not actually observed in data. Specifically, ligand efficiency assumes a linear relationship between potency and risk factors like lipophilicity, but data does not always support this assumption. It also notes that ligand efficiency incorporates arbitrary concentration units that can affect calculated values. The document suggests plotting affinity against risk factors to test the assumptions behind ligand efficiency metrics.
Ligand efficiency: nice concept shame about the metricsPeter Kenny
Ligand efficiency metrics are meant to normalize ligand activity with respect to molecular properties that increase risk, however they make assumptions about relationships that are not always valid. Residuals, which quantify the extent to which activity beats a trend line fit to the data, may be a better approach as they do not depend on standard concentration units or require defining a trend line a priori. Normalizing data requires accurately modeling the observed trends in a data set, and ligand efficiency metrics can distort perception if the trends are not properly considered.
This document describes two machine learning techniques, particle swarm optimization with support vector machines (PSO-SVM) and recursive feature elimination with support vector machines (RFE-SVM), that were used to classify autism neuroimaging data from the Autism Brain Imaging Data Exchange database. PSO-SVM was used to select discriminative features for classification, while RFE-SVM ranked features by importance. Both techniques aimed to improve classification accuracy and reduce overfitting by selecting optimal feature subsets from the high-dimensional neuroimaging data. The results could help develop brain-based diagnostic criteria for autism.
Stable Drug Designing by Minimizing Drug Protein Interaction Energy Using PSO csandit
1. The document proposes using a particle swarm optimization (PSO) algorithm to design stable drug molecules that minimize interaction energy with target proteins.
2. In the algorithm, drugs are represented as variable-length trees containing functional groups, and PSO is used to optimize van der Waals and electrostatic interaction energies.
3. Results show that PSO performs better than previous fixed-length tree methods at designing drugs that stably bind to active sites of human rhinovirus, malaria, and HIV proteins.
Each and every biological function in living organism occurs due to protein-protein interactions. The
diseases are no exception to this. Identifying one or more proteins for a particular disease and then
designing a suitable chemical compound (which is known as drug or ligand) to destroy those proteins is a
challenging topic of research in computational biology. In earlier methods, drugs were designed using only
a few chemical components and were represented as a fixed-length tree. But in reality, a drug contains
many chemical groups collectively known as pharmacophore. Moreover, the chemical length of the drug
cannot be determined before designing that drug.
In the present work, a Particle Swarm Optimization (PSO) based methodology has been proposed to find
out a suitable drug for a particular disease so that the drug-target protein interaction energy becomes
minimum. In the proposed algorithm, the drug is represented as a variable length tree and essential
functional groups are arranged in different positions of that drug. Finally, the structure of the drug is
obtained and its docking energy is minimized simultaneously. Also, the orientation of chemical groups in
the drug is tested so that it can bind to a particular active site of a target protein and the drug fits well
inside the active site of target protein. Here, several inter-molecular forces have been considered for
accuracy of the docking energy. Results are demonstrated for three different target proteins both
numerically and pictorially. Results show that PSO performs better than the earlier methods.
Multi-Objective Optimization for Clustering of Medical PublicationsDiego Molla-Aliod
A. Ekbal, S. Saha, D. Mollá, and K. Ravikumar.
Multi-Objective Optimization for Clustering of Medical
Publications (2013). Proceedings of the Australasian
Language Technology Association Workshop 2013
(ALTA 2013),
pp53-61, Brisbane, Australia. http://aclweb.org/anthology/U/U13/
GPCODON ALIGNMENT: A GLOBAL PAIRWISE CODON BASED SEQUENCE ALIGNMENT APPROACHijdms
The alignment of two DNA sequences is a basic step in the analysis of biological data. Sequencing a long
DNA sequence is one of the most interesting problems in bioinformatics. Several techniques have been
developed to solve this sequence alignment problem like dynamic programming and heuristic algorithms.
In this paper, we introduce (GPCodon alignment) a pairwise DNA-DNA method for global sequence
alignment that improves the accuracy of pairwise sequence alignment. We use a new scoring matrix to
produce the final alignment called the empirical codon substitution matrix. Using this matrix in our
technique enabled the discovery of new relationships between sequences that could not be discovered using
traditional matrices. In addition, we present experimental results that show the performance of the
proposed technique over eleven datasets of average length of 2967 bps. We compared the efficiency and
accuracy of our techniques against a comparable tool called “Pairwise Align Codons” [1].
This document discusses issues with commonly used ligand efficiency metrics. It argues that ligand efficiency metrics make unrealistic assumptions by normalizing potency based on trends not actually observed in data. Specifically, ligand efficiency assumes a linear relationship between potency and risk factors like lipophilicity, but data does not always support this assumption. It also notes that ligand efficiency incorporates arbitrary concentration units that can affect calculated values. The document suggests plotting affinity against risk factors to test the assumptions behind ligand efficiency metrics.
Ligand efficiency: nice concept shame about the metricsPeter Kenny
Ligand efficiency metrics are meant to normalize ligand activity with respect to molecular properties that increase risk, however they make assumptions about relationships that are not always valid. Residuals, which quantify the extent to which activity beats a trend line fit to the data, may be a better approach as they do not depend on standard concentration units or require defining a trend line a priori. Normalizing data requires accurately modeling the observed trends in a data set, and ligand efficiency metrics can distort perception if the trends are not properly considered.
This document describes two machine learning techniques, particle swarm optimization with support vector machines (PSO-SVM) and recursive feature elimination with support vector machines (RFE-SVM), that were used to classify autism neuroimaging data from the Autism Brain Imaging Data Exchange database. PSO-SVM was used to select discriminative features for classification, while RFE-SVM ranked features by importance. Both techniques aimed to improve classification accuracy and reduce overfitting by selecting optimal feature subsets from the high-dimensional neuroimaging data. The results could help develop brain-based diagnostic criteria for autism.
This document discusses issues with commonly used methods in molecular design and data analysis, including ligand efficiency metrics (LEMs). It argues that LEMs make unfounded assumptions about relationships between activity and risk factors. Instead, residuals from modeling activity data directly should be used to quantify how much activity exceeds trends, as they are invariant to choices like standard concentration and treat all risk factors consistently. The document advocates understanding data properties before analysis and avoiding practices like arbitrarily binning data that can distort correlations.
IRJET- Classification of Chemical Medicine or Drug using K Nearest Neighb...IRJET Journal
This document proposes using a combination of K-nearest neighbors (KNN) and genetic algorithms to classify chemical medicine or drug data with improved accuracy. KNN is described as a simple and effective classification algorithm that stores training data instances. Genetic algorithms are presented as evolutionary algorithms useful for optimization problems. The proposed system applies genetic search to rank attribute importance, selects high-ranked attributes, and then applies both KNN and genetic algorithms to classify the drug data, aiming to improve classification accuracy over using either technique alone. The combination of KNN and genetic algorithms is expected to better optimize classification of complex medical data compared to other algorithms.
Thermodynamics for medicinal chemistry designPeter Kenny
This document discusses concepts in medicinal chemistry design and thermodynamics. It begins by outlining some challenges in drug discovery, such as targeting weakly linked disease targets and predicting toxicity. It then discusses molecular design approaches, including controlling compound properties and sampling chemical space. Key concepts discussed include target engagement potential, property-based design to find an optimal "sweet spot", and using thermodynamics and molecular interactions to analyze activity and properties. The document questions the use of rules and guidelines in medicinal chemistry and advocates analyzing data to understand actual trends rather than assuming functional forms. It also discusses issues with ligand efficiency metrics and advocates using residuals to quantify activity compared to observed trends in the data.
APPLICATION OF CLONAL SELECTION IMMUNE SYSTEM METHOD FOR OPTIMIZATION OF DIST...UniversitasGadjahMada
This paper proposes an application of clonal selection immune system method for optimization of distribution network. The distribution network with high-performance is a network that has a low power loss, better voltage profile, and loading balance among feeders. The task for improving the performance of the distribution network is optimization of network configuration. The optimization has become a necessary study with the presence of DG in entire networks. In this work, optimization of network configuration is based on an AIS algorithm. The methodology has been tested in a model of 33 bus IEEE radial distribution networks with and without DG integration. The results have been showed that the optimal configuration of the distribution network is able to reduce power loss and to improve the voltage profile of the distribution network significantly.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
A clonal based algorithm for the reconstruction of genetic network using s sy...eSAT Journals
Abstract Motivation: Gene regulatory network is the network based approach to represent the interactions between genes. DNA microarray is the most widely used technology for extracting the relationships between thousands of genes simultaneously. Gene microarray experiment provides the gene expression data for a particular condition and varying time periods. The expression of a particular gene depends upon the biological conditions and other genes. In this paper, we propose a new method for the analysis of microarray data. The proposed method makes use of S-system, which is a well-accepted model for the gene regulatory network reconstruction. Since the problem has multiple solutions, we have to identify an optimized solution. Evolutionary algorithms have been used to solve such problems. Though there are a number of attempts already been carried out by various researchers, the solutions are still not that satisfactory with respect to the time taken and the degree of accuracy achieved. Therefore, there is a need of huge amount further work in this topic for achieving solutions with improved performances. Results: In this work, we have proposed Clonal selection algorithm for identifying optimal gene regulatory network. The approach is tested on the real life data: SOS Ecoli DNA repairing gene expression data. It is observed that the proposed algorithm converges much faster and provides better results than the existing algorithms. Index Terms: Microarray analysis, Evolutionary Algorithm, Artificial Immune System, S-system, Gene Regulatory Network, SOS Ecoli DNA repairing, Clonal Selection Algorithm.
Mining Big datasets to create and validate machine learning modelsSean Ekins
This document summarizes efforts to mine large datasets to create and validate machine learning models for drug discovery. It discusses using datasets from PubChem, ChEMBL, and ToxCast containing hundreds of thousands to millions of compounds tested against various targets and endpoints. Models were built using these datasets with Bayesian algorithms and fingerprints. The models achieved good performance and were shared online and incorporated into mobile apps. Future work discussed expanding this approach to even larger datasets from PubChem and ToxCast to further test scaling of algorithms and model validation.
Property-based molecular design: where next? (12-Jun-2015)Peter Kenny
The document discusses property-based molecular design and some challenges in drug discovery. It notes that molecular design aims to control compound behavior through manipulation of molecular properties in a hypothesis-driven or prediction-driven manner. However, toxicity can be unpredictable and measuring free drug concentrations in vivo is difficult. The document also discusses using structural relationships between compounds as a framework for molecular design and analysis of activity and properties, and notes that both hypothesis-driven and prediction-driven approaches have limitations that require further consideration.
Improving the effectiveness of information retrieval system using adaptive ge...ijcsit
The document describes research into improving the effectiveness of information retrieval systems using an adaptive genetic algorithm. A genetic algorithm with variable crossover and mutation probabilities (adaptive GA) is investigated. The adaptive GA is tested on 242 Arabic abstracts using three information retrieval models: vector space model, extended Boolean model, and language model. Results show the adaptive GA approach improves retrieval effectiveness over traditional genetic algorithms and baseline information retrieval systems, as measured by average recall and precision. Key aspects of the adaptive GA used include variable crossover and mutation probabilities tuned during the search process, and fitness functions based on document retrieval order.
Initial Optimal Parameters of Artificial Neural Network and Support Vector Re...IJECEIAES
This paper presents architecture of backpropagation Artificial Neural Network (ANN) and Support Vector Regression (SVR) models in supervised learning process for cement demand dataset. This study aims to identify the effectiveness of each parameter of mean square error (MSE) indicators for time series dataset. The study varies different random sample in each demand parameter in the network of ANN and support vector function as well. The variations of percent datasets from activation function, learning rate of sigmoid and purelin, hidden layer, neurons, and training function should be applied for ANN. Furthermore, SVR is varied in kernel function, lost function and insensitivity to obtain the best result from its simulation. The best results of this study for ANN activation function is Sigmoid. The amount of data input is 100% or 96 of data, 150 learning rates, one hidden layer, trinlm training function, 15 neurons and 3 total layers. The best results for SVR are six variables that run in optimal condition, kernel function is linear, loss function is ౬ -insensitive, and insensitivity was 1. The better results for both methods are six variables. The contribution of this study is to obtain the optimal parameters for specific variables of ANN and SVR.
Aspects of pharmaceutical molecular design (Fidelta version)Peter Kenny
This document discusses various aspects of pharmaceutical molecular design. It touches on three key points:
1) Pharmaceutical molecular design aims to control compound behavior through manipulation of molecular properties in a hypothesis-driven or prediction-driven manner.
2) Hypothesis-driven design frameworks help efficiently assemble structure-activity relationships to better understand molecules and ask insightful questions.
3) Prediction-driven design assumes predictive models can be built with sufficient accuracy, though issues like non-uniform sampling of chemical space and overfitting remain challenges.
partition coefficients in drug discoveryPeter Kenny
Partition coefficients are commonly used to model drug permeability and solubility. While octanol/water is typically used, it may not fully account for hydrogen bonding abilities. Differences between octanol/water and alkane/water logP values can provide insights into a drug's hydrogen bonding. A ClogPalk model was developed using molecular surface area and functional group perturbations to predict alkane/water logP. Structural relationships between compounds can be used as a framework for molecular design, property prediction, and identifying outliers that suggest new bioisosteres or interesting effects beyond typical models.
Review on Computational Bioinformatics and Molecular Modelling Novel Tool for...ijtsrd
Advancement in science and technology has brought a remarkable change in the field of drug discovery. Earlier it was very difficult to predict the target for receptor but nowadays, it is easy and robust task to dock the target protein with ligand and binding affinity is calculated. Docking helps in the virtual screening of drug along with its hit identification. There are two approaches through which docking can be carried out, shape complementary and stimulation approach. There are many procedures involved in carrying out docking and all require different software's and algorithms. Molecular docking serves as a good platform to screen a large number of ligands and is useful in Drug-DNA studies. This review mainly focuses on the general idea of molecular docking and discusses its major applications, different types of interaction involved and types of docking. Rishabh Jain "Review on Computational Bioinformatics and Molecular Modelling: Novel Tool for Drug Discovery" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-1 , December 2018, URL: http://www.ijtsrd.com/papers/ijtsrd18914.pdf
http://www.ijtsrd.com/pharmacy/pharmacoinformatics/18914/review-on-computational-bioinformatics-and-molecular-modelling-novel-tool-for-drug-discovery/rishabh-jain
Improving the performance of k nearest neighbor algorithm for the classificat...IAEME Publication
The document discusses improving the performance of the k-nearest neighbor (kNN) algorithm for classifying diabetes datasets with missing values. It first provides background on diabetes and challenges with missing data. It then describes various data preprocessing techniques used to handle missing values, including mean imputation. The document outlines the kNN classification algorithm and metrics like accuracy and error rate to evaluate performance. It applies these techniques to the Pima Indian diabetes dataset and finds that imputing missing values along with suitable preprocessing like normalization increases classification accuracy compared to ignoring missing values or imputation alone.
Advanced Statistical Manual for Ayurveda ResearchAyurdata
These slides covers more advanced statistical applications including that in data science.
The mode of presentation is that the concept is introduced first, followed by illustration and the use in a real context.
Trust Enhanced Role Based Access Control Using Genetic Algorithm IJECEIAES
Improvements in technological innovations have become a boon for business organizations, firms, institutions, etc. System applications are being developed for organizations whether small-scale or large-scale. Taking into consideration the hierarchical nature of large organizations, security is an important factor which needs to be taken into account. For any healthcare organization, maintaining the confidentiality and integrity of the patients’ records is of utmost importance while ensuring that they are only available to the authorized personnel. The paper discusses the technique of Role-Based Access Control (RBAC) and its different aspects. The paper also suggests a trust enhanced model of RBAC implemented with selection and mutation only ‘Genetic Algorithm’. A practical scenario involving healthcare organization has also been considered. A model has been developed to consider the policies of different health departments and how it affects the permissions of a particular role. The purpose of the algorithm is to allocate tasks for every employee in an automated manner and ensures that they are not over-burdened with the work assigned. In addition, the trust records of the employees ensure that malicious users do not gain access to confidential patient data.
Virtual Toxicity panels focussed on interpretable machine learning models that can guide medicinal chemists to identify critical substructures that are assocaited with toxicities.
Vinayaka : A Semi-Supervised Projected Clustering Method Using Differential E...ijseajournal
- The document presents VINAYAKA, a semi-supervised projected clustering method using differential evolution.
- VINAYAKA uses a hybrid cluster validation index combining the Subspace Clustering Quality Estimate index (for internal validation) and Gini index gain (for external validation). Differential evolution optimizes this index to find optimal subspace cluster centers.
- The method is tested on the Wisconsin breast cancer dataset, and synthetic datasets are used to demonstrate that the hybrid index can identify the correct number of clusters more accurately than an internal index alone.
Dynamic Radius Species Conserving Genetic Algorithm for Test Generation for S...ijseajournal
This document summarizes a research paper that proposes a new approach called Dynamic-radius Species-conserving Genetic Algorithm (DSGA) for generating structural test cases using a genetic algorithm. DSGA aims to generate a complete test suite with a single run by finding test cases that cover different areas of the program structure. It begins by finding test cases that cover some areas, then excludes those areas to search for test cases covering other uncovered areas, similar to how humans generate structural test cases. The paper evaluates DSGA on the Triangle Classification algorithm and finds it able to generate a complete test suite without limitations of other genetic algorithm approaches for structural test case generation.
Prediction of Dengue, Diabetes and Swine Flu using Random Forest Classificati...IRJET Journal
This document describes a disease prediction system that uses the Random Forest classification algorithm to predict Dengue, diabetes, and swine flu. The system trains on labeled datasets for each disease. It then takes user-entered symptoms as input and predicts the likelihood of each disease. If a disease is predicted to be positive, the system recommends a specialized doctor. The document discusses related work on disease prediction using data mining techniques. It provides an overview of how the Random Forest algorithm works for classification problems and ensemble learning. The proposed system aims to help users predict diseases and find appropriate doctors for treatment.
The document summarizes recent developments in using machine learning techniques for computational drug docking. It finds that machine learning methods, such as random forests, can more accurately predict binding affinity between proteins and ligands compared to traditional scoring functions. Specifically, the best random forest model achieved a correlation of 0.803 between predicted and experimental binding affinity, compared to 0.644 for classical scoring functions. Machine learning also more accurately ranks ligands and identifies the top binding pose. The document concludes that machine learning is better able to utilize relevant molecular features for computational drug docking compared to traditional methods.
A Critical Assessment Of Mus Musculus Gene Function Prediction Using Integrat...Sara Alvarez
This study assessed methods for predicting gene function in mice using integrated genomic data. Researchers provided standardized mouse genomic and functional annotation data to 9 bioinformatics teams. The teams used this data to independently train classifiers and predict functions, defined by Gene Ontology terms, for over 21,000 mouse genes. The best performing predictions were combined. This approach inferred functions for 76% of mouse genes, including 5,000 previously uncharacterized genes. At a 20% recall rate, the unified predictions averaged 41% precision, with 26% of terms achieving over 90% precision. The results demonstrate that currently available mammalian data allows predicting gene functions with both breadth and accuracy, including many novel predictions for previously uncharacterized genes.
This document discusses issues with commonly used methods in molecular design and data analysis, including ligand efficiency metrics (LEMs). It argues that LEMs make unfounded assumptions about relationships between activity and risk factors. Instead, residuals from modeling activity data directly should be used to quantify how much activity exceeds trends, as they are invariant to choices like standard concentration and treat all risk factors consistently. The document advocates understanding data properties before analysis and avoiding practices like arbitrarily binning data that can distort correlations.
IRJET- Classification of Chemical Medicine or Drug using K Nearest Neighb...IRJET Journal
This document proposes using a combination of K-nearest neighbors (KNN) and genetic algorithms to classify chemical medicine or drug data with improved accuracy. KNN is described as a simple and effective classification algorithm that stores training data instances. Genetic algorithms are presented as evolutionary algorithms useful for optimization problems. The proposed system applies genetic search to rank attribute importance, selects high-ranked attributes, and then applies both KNN and genetic algorithms to classify the drug data, aiming to improve classification accuracy over using either technique alone. The combination of KNN and genetic algorithms is expected to better optimize classification of complex medical data compared to other algorithms.
Thermodynamics for medicinal chemistry designPeter Kenny
This document discusses concepts in medicinal chemistry design and thermodynamics. It begins by outlining some challenges in drug discovery, such as targeting weakly linked disease targets and predicting toxicity. It then discusses molecular design approaches, including controlling compound properties and sampling chemical space. Key concepts discussed include target engagement potential, property-based design to find an optimal "sweet spot", and using thermodynamics and molecular interactions to analyze activity and properties. The document questions the use of rules and guidelines in medicinal chemistry and advocates analyzing data to understand actual trends rather than assuming functional forms. It also discusses issues with ligand efficiency metrics and advocates using residuals to quantify activity compared to observed trends in the data.
APPLICATION OF CLONAL SELECTION IMMUNE SYSTEM METHOD FOR OPTIMIZATION OF DIST...UniversitasGadjahMada
This paper proposes an application of clonal selection immune system method for optimization of distribution network. The distribution network with high-performance is a network that has a low power loss, better voltage profile, and loading balance among feeders. The task for improving the performance of the distribution network is optimization of network configuration. The optimization has become a necessary study with the presence of DG in entire networks. In this work, optimization of network configuration is based on an AIS algorithm. The methodology has been tested in a model of 33 bus IEEE radial distribution networks with and without DG integration. The results have been showed that the optimal configuration of the distribution network is able to reduce power loss and to improve the voltage profile of the distribution network significantly.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
A clonal based algorithm for the reconstruction of genetic network using s sy...eSAT Journals
Abstract Motivation: Gene regulatory network is the network based approach to represent the interactions between genes. DNA microarray is the most widely used technology for extracting the relationships between thousands of genes simultaneously. Gene microarray experiment provides the gene expression data for a particular condition and varying time periods. The expression of a particular gene depends upon the biological conditions and other genes. In this paper, we propose a new method for the analysis of microarray data. The proposed method makes use of S-system, which is a well-accepted model for the gene regulatory network reconstruction. Since the problem has multiple solutions, we have to identify an optimized solution. Evolutionary algorithms have been used to solve such problems. Though there are a number of attempts already been carried out by various researchers, the solutions are still not that satisfactory with respect to the time taken and the degree of accuracy achieved. Therefore, there is a need of huge amount further work in this topic for achieving solutions with improved performances. Results: In this work, we have proposed Clonal selection algorithm for identifying optimal gene regulatory network. The approach is tested on the real life data: SOS Ecoli DNA repairing gene expression data. It is observed that the proposed algorithm converges much faster and provides better results than the existing algorithms. Index Terms: Microarray analysis, Evolutionary Algorithm, Artificial Immune System, S-system, Gene Regulatory Network, SOS Ecoli DNA repairing, Clonal Selection Algorithm.
Mining Big datasets to create and validate machine learning modelsSean Ekins
This document summarizes efforts to mine large datasets to create and validate machine learning models for drug discovery. It discusses using datasets from PubChem, ChEMBL, and ToxCast containing hundreds of thousands to millions of compounds tested against various targets and endpoints. Models were built using these datasets with Bayesian algorithms and fingerprints. The models achieved good performance and were shared online and incorporated into mobile apps. Future work discussed expanding this approach to even larger datasets from PubChem and ToxCast to further test scaling of algorithms and model validation.
Property-based molecular design: where next? (12-Jun-2015)Peter Kenny
The document discusses property-based molecular design and some challenges in drug discovery. It notes that molecular design aims to control compound behavior through manipulation of molecular properties in a hypothesis-driven or prediction-driven manner. However, toxicity can be unpredictable and measuring free drug concentrations in vivo is difficult. The document also discusses using structural relationships between compounds as a framework for molecular design and analysis of activity and properties, and notes that both hypothesis-driven and prediction-driven approaches have limitations that require further consideration.
Improving the effectiveness of information retrieval system using adaptive ge...ijcsit
The document describes research into improving the effectiveness of information retrieval systems using an adaptive genetic algorithm. A genetic algorithm with variable crossover and mutation probabilities (adaptive GA) is investigated. The adaptive GA is tested on 242 Arabic abstracts using three information retrieval models: vector space model, extended Boolean model, and language model. Results show the adaptive GA approach improves retrieval effectiveness over traditional genetic algorithms and baseline information retrieval systems, as measured by average recall and precision. Key aspects of the adaptive GA used include variable crossover and mutation probabilities tuned during the search process, and fitness functions based on document retrieval order.
Initial Optimal Parameters of Artificial Neural Network and Support Vector Re...IJECEIAES
This paper presents architecture of backpropagation Artificial Neural Network (ANN) and Support Vector Regression (SVR) models in supervised learning process for cement demand dataset. This study aims to identify the effectiveness of each parameter of mean square error (MSE) indicators for time series dataset. The study varies different random sample in each demand parameter in the network of ANN and support vector function as well. The variations of percent datasets from activation function, learning rate of sigmoid and purelin, hidden layer, neurons, and training function should be applied for ANN. Furthermore, SVR is varied in kernel function, lost function and insensitivity to obtain the best result from its simulation. The best results of this study for ANN activation function is Sigmoid. The amount of data input is 100% or 96 of data, 150 learning rates, one hidden layer, trinlm training function, 15 neurons and 3 total layers. The best results for SVR are six variables that run in optimal condition, kernel function is linear, loss function is ౬ -insensitive, and insensitivity was 1. The better results for both methods are six variables. The contribution of this study is to obtain the optimal parameters for specific variables of ANN and SVR.
Aspects of pharmaceutical molecular design (Fidelta version)Peter Kenny
This document discusses various aspects of pharmaceutical molecular design. It touches on three key points:
1) Pharmaceutical molecular design aims to control compound behavior through manipulation of molecular properties in a hypothesis-driven or prediction-driven manner.
2) Hypothesis-driven design frameworks help efficiently assemble structure-activity relationships to better understand molecules and ask insightful questions.
3) Prediction-driven design assumes predictive models can be built with sufficient accuracy, though issues like non-uniform sampling of chemical space and overfitting remain challenges.
partition coefficients in drug discoveryPeter Kenny
Partition coefficients are commonly used to model drug permeability and solubility. While octanol/water is typically used, it may not fully account for hydrogen bonding abilities. Differences between octanol/water and alkane/water logP values can provide insights into a drug's hydrogen bonding. A ClogPalk model was developed using molecular surface area and functional group perturbations to predict alkane/water logP. Structural relationships between compounds can be used as a framework for molecular design, property prediction, and identifying outliers that suggest new bioisosteres or interesting effects beyond typical models.
Review on Computational Bioinformatics and Molecular Modelling Novel Tool for...ijtsrd
Advancement in science and technology has brought a remarkable change in the field of drug discovery. Earlier it was very difficult to predict the target for receptor but nowadays, it is easy and robust task to dock the target protein with ligand and binding affinity is calculated. Docking helps in the virtual screening of drug along with its hit identification. There are two approaches through which docking can be carried out, shape complementary and stimulation approach. There are many procedures involved in carrying out docking and all require different software's and algorithms. Molecular docking serves as a good platform to screen a large number of ligands and is useful in Drug-DNA studies. This review mainly focuses on the general idea of molecular docking and discusses its major applications, different types of interaction involved and types of docking. Rishabh Jain "Review on Computational Bioinformatics and Molecular Modelling: Novel Tool for Drug Discovery" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-1 , December 2018, URL: http://www.ijtsrd.com/papers/ijtsrd18914.pdf
http://www.ijtsrd.com/pharmacy/pharmacoinformatics/18914/review-on-computational-bioinformatics-and-molecular-modelling-novel-tool-for-drug-discovery/rishabh-jain
Improving the performance of k nearest neighbor algorithm for the classificat...IAEME Publication
The document discusses improving the performance of the k-nearest neighbor (kNN) algorithm for classifying diabetes datasets with missing values. It first provides background on diabetes and challenges with missing data. It then describes various data preprocessing techniques used to handle missing values, including mean imputation. The document outlines the kNN classification algorithm and metrics like accuracy and error rate to evaluate performance. It applies these techniques to the Pima Indian diabetes dataset and finds that imputing missing values along with suitable preprocessing like normalization increases classification accuracy compared to ignoring missing values or imputation alone.
Advanced Statistical Manual for Ayurveda ResearchAyurdata
These slides covers more advanced statistical applications including that in data science.
The mode of presentation is that the concept is introduced first, followed by illustration and the use in a real context.
Trust Enhanced Role Based Access Control Using Genetic Algorithm IJECEIAES
Improvements in technological innovations have become a boon for business organizations, firms, institutions, etc. System applications are being developed for organizations whether small-scale or large-scale. Taking into consideration the hierarchical nature of large organizations, security is an important factor which needs to be taken into account. For any healthcare organization, maintaining the confidentiality and integrity of the patients’ records is of utmost importance while ensuring that they are only available to the authorized personnel. The paper discusses the technique of Role-Based Access Control (RBAC) and its different aspects. The paper also suggests a trust enhanced model of RBAC implemented with selection and mutation only ‘Genetic Algorithm’. A practical scenario involving healthcare organization has also been considered. A model has been developed to consider the policies of different health departments and how it affects the permissions of a particular role. The purpose of the algorithm is to allocate tasks for every employee in an automated manner and ensures that they are not over-burdened with the work assigned. In addition, the trust records of the employees ensure that malicious users do not gain access to confidential patient data.
Virtual Toxicity panels focussed on interpretable machine learning models that can guide medicinal chemists to identify critical substructures that are assocaited with toxicities.
Vinayaka : A Semi-Supervised Projected Clustering Method Using Differential E...ijseajournal
- The document presents VINAYAKA, a semi-supervised projected clustering method using differential evolution.
- VINAYAKA uses a hybrid cluster validation index combining the Subspace Clustering Quality Estimate index (for internal validation) and Gini index gain (for external validation). Differential evolution optimizes this index to find optimal subspace cluster centers.
- The method is tested on the Wisconsin breast cancer dataset, and synthetic datasets are used to demonstrate that the hybrid index can identify the correct number of clusters more accurately than an internal index alone.
Dynamic Radius Species Conserving Genetic Algorithm for Test Generation for S...ijseajournal
This document summarizes a research paper that proposes a new approach called Dynamic-radius Species-conserving Genetic Algorithm (DSGA) for generating structural test cases using a genetic algorithm. DSGA aims to generate a complete test suite with a single run by finding test cases that cover different areas of the program structure. It begins by finding test cases that cover some areas, then excludes those areas to search for test cases covering other uncovered areas, similar to how humans generate structural test cases. The paper evaluates DSGA on the Triangle Classification algorithm and finds it able to generate a complete test suite without limitations of other genetic algorithm approaches for structural test case generation.
Prediction of Dengue, Diabetes and Swine Flu using Random Forest Classificati...IRJET Journal
This document describes a disease prediction system that uses the Random Forest classification algorithm to predict Dengue, diabetes, and swine flu. The system trains on labeled datasets for each disease. It then takes user-entered symptoms as input and predicts the likelihood of each disease. If a disease is predicted to be positive, the system recommends a specialized doctor. The document discusses related work on disease prediction using data mining techniques. It provides an overview of how the Random Forest algorithm works for classification problems and ensemble learning. The proposed system aims to help users predict diseases and find appropriate doctors for treatment.
The document summarizes recent developments in using machine learning techniques for computational drug docking. It finds that machine learning methods, such as random forests, can more accurately predict binding affinity between proteins and ligands compared to traditional scoring functions. Specifically, the best random forest model achieved a correlation of 0.803 between predicted and experimental binding affinity, compared to 0.644 for classical scoring functions. Machine learning also more accurately ranks ligands and identifies the top binding pose. The document concludes that machine learning is better able to utilize relevant molecular features for computational drug docking compared to traditional methods.
A Critical Assessment Of Mus Musculus Gene Function Prediction Using Integrat...Sara Alvarez
This study assessed methods for predicting gene function in mice using integrated genomic data. Researchers provided standardized mouse genomic and functional annotation data to 9 bioinformatics teams. The teams used this data to independently train classifiers and predict functions, defined by Gene Ontology terms, for over 21,000 mouse genes. The best performing predictions were combined. This approach inferred functions for 76% of mouse genes, including 5,000 previously uncharacterized genes. At a 20% recall rate, the unified predictions averaged 41% precision, with 26% of terms achieving over 90% precision. The results demonstrate that currently available mammalian data allows predicting gene functions with both breadth and accuracy, including many novel predictions for previously uncharacterized genes.
- The document proposes a multi-view stacking ensemble method for drug-target interaction (DTI) prediction that combines predictions from multiple machine learning models trained on different drug and target feature view combinations.
- It generates 126 view combination datasets from 14 drug views and 9 target views, then trains extra trees, random forest, and XGBoost classifiers on each view combination. Predictions from these base models are then combined using a stacking ensemble with an extra trees meta-learner.
- The method is shown to outperform single models and voting ensembles, and calibration of the meta-learner and use of local imbalance measures provide further improvements to predictive performance on DTI prediction tasks.
Multivariate sample similarity measure for feature selection with a resemblan...IJECEIAES
Feature selection improves the classification performance of machine learning models. It also identifies the important features and eliminates those with little significance. Furthermore, feature selection reduces the dimensionality of training and testing data points. This study proposes a feature selection method that uses a multivariate sample similarity measure. The method selects features with significant contributions using a machine-learning model. The multivariate sample similarity measure is evaluated using the University of California, Irvine heart disease dataset and compared with existing feature selection methods. The multivariate sample similarity measure is evaluated with metrics such as minimum subset selected, accuracy, F1-score, and area under the curve (AUC). The results show that the proposed method is able to diagnose chest pain, thallium scan, and major vessels scanned using X-rays with a high capability to distinguish between healthy and heart disease patients with a 99.6% accuracy.
The increased availability of biomedical data, particularly in the public domain, offers the opportunity to better understand human health and to develop effective therapeutics for a wide range of unmet medical needs. However, data scientists remain stymied by the fact that data remain hard to find and to productively reuse because data and their metadata i) are wholly inaccessible, ii) are in non-standard or incompatible representations, iii) do not conform to community standards, and iv) have unclear or highly restricted terms and conditions that preclude legitimate reuse. These limitations require a rethink on data can be made machine and AI-ready - the key motivation behind the FAIR Guiding Principles. Concurrently, while recent efforts have explored the use of deep learning to fuse disparate data into predictive models for a wide range of biomedical applications, these models often fail even when the correct answer is already known, and fail to explain individual predictions in terms that data scientists can appreciate. These limitations suggest that new methods to produce practical artificial intelligence are still needed.
In this talk, I will discuss our work in (1) building an integrative knowledge infrastructure to prepare FAIR and "AI-ready" data and services along with (2) neurosymbolic AI methods to improve the quality of predictions and to generate plausible explanations. Attention is given to standards, platforms, and methods to wrangle knowledge into simple, but effective semantic and latent representations, and to make these available into standards-compliant and discoverable interfaces that can be used in model building, validation, and explanation. Our work, and those of others in the field, creates a baseline for building trustworthy and easy to deploy AI models in biomedicine.
Bio
Dr. Michel Dumontier is the Distinguished Professor of Data Science at Maastricht University, founder and executive director of the Institute of Data Science, and co-founder of the FAIR (Findable, Accessible, Interoperable and Reusable) data principles. His research explores socio-technological approaches for responsible discovery science, which includes collaborative multi-modal knowledge graphs, privacy-preserving distributed data mining, and AI methods for drug discovery and personalized medicine. His work is supported through the Dutch National Research Agenda, the Netherlands Organisation for Scientific Research, Horizon Europe, the European Open Science Cloud, the US National Institutes of Health, and a Marie-Curie Innovative Training Network. He is the editor-in-chief for the journal Data Science and is internationally recognized for his contributions in bioinformatics, biomedical informatics, and semantic technologies including ontologies and linked data.
Efficiency of Prediction Algorithms for Mining Biological DatabasesIOSR Journals
This document analyzes the efficiency of various prediction algorithms for mining biological databases. It discusses prediction through mining biological databases to identify disease risks. It then evaluates several prediction algorithms (ZeroR, OneR, JRip, PART, Decision Table) on a breast cancer dataset using measures like accuracy, sensitivity, specificity, and predictive values. The results show that the JRip and PART algorithms generally had the highest accuracy rates, around 70%, while ZeroR had the lowest accuracy. However, ZeroR had a perfect positive predictive value. The study aims to assess the most efficient algorithms for predictive mining of biological data.
This document describes a study that uses machine learning algorithms to efficiently predict DNA-binding proteins. Support vector machines and cascade correlation neural networks are optimized and compared to determine the best performing model. The SVM model achieves 86.7% accuracy at predicting DNA-binding proteins using features like overall charge, patch size, and amino acid composition of proteins. The CCNN model achieves lower accuracy of 75.4%. The study aims to improve on previous work by using the standard jack-knife validation technique to evaluate model performance on unseen data.
Impact of Classification Algorithms on Cardiotocography Dataset for Fetal Sta...BRNSSPublicationHubI
This document summarizes a study that used four classification algorithms (KNN, decision tree, support vector machine, naive Bayes) to predict fetal state from cardiotocography data. The researchers first cleaned the data, removed outliers, and selected features before splitting the data into training and test sets. They then trained and evaluated the four classification models and compared their performance on the full dataset and a reduced dataset with outliers removed and fewer features. The best performing model could help develop automated clinical decision support systems to analyze cardiotocography records.
Novel modelling of clustering for enhanced classification performance on gene...IJECEIAES
Gene expression data is popularized for its capability to disclose various disease conditions. However, the conventional procedure to extract gene expression data itself incorporates various artifacts that offer challenges in diagnosis a complex disease indication and classification like cancer. Review of existing research approaches indicates that classification approaches are few to proven to be standard with respect to higher accuracy and applicable to gene expression data apart from unaddresed problems of computational complexity. Therefore, the proposed manuscript introduces a novel and simplified model capable using Graph Fourier Transform, Eigen Value and vector for offering better classification performance considering case study of microarray database, which is one typical example of gene expressiondata. The study outcome shows that proposed system offers comparatively better accuracy and reduced computational complexity with the existing clustering approaches.
The pLoc bal-mHum is a powerful web-serve for predicting the subcellular loca...IJBNT Journal
This document describes a powerful web server called pLoc bal-mHum that was developed in 2019 to predict the subcellular localization of human proteins based solely on their amino acid sequences. The tool uses artificial intelligence/machine learning techniques and was shown to predict localization with 94-100% accuracy across 14 different subcellular locations. The tool follows Chou's 5-step rule for developing predictive models in a logical, transparent, and reproducible manner. It has been widely cited and used by other researchers for developing predictive models of post-translational modifications and subcellular locations of proteins from various organisms.
This document is a research statement by Chien-Wei (Masaki) Lin that summarizes his past and ongoing methodology and collaborative research projects. It discusses his interests in developing statistical methods for analyzing multi-omics data, including power calculation tools, meta-analysis and integrative analysis methods. It also summarizes some of Lin's collaboration projects applying these statistical methods to study topics like brain aging, major depressive disorder, and cardiovascular epidemiology. The document references 18 of Lin's publications and provides an overview of his diverse experience and future research plans developing statistical tools and methods and applying them to biological problems.
DSAGLSTM-DTA: Prediction of Drug-Target Affinity using Dual Self-Attention an...mlaij
The research on affinity between drugs and targets (DTA) aims to effectively narrow the target search space for drug repurposing. Therefore, reasonable prediction of drug and target affinities can minimize the waste of resources such as human and material resources. In this work, a novel graph-based model called DSAGLSTM-DTA was proposed for DTA prediction. The proposed model is unlike previous graph-based drug-target affinity model, which incorporated self-attention mechanisms in the feature extraction process of drug molecular graphs to fully extract its effective feature representations. The features of each atom in the 2D molecular graph were weighted based on attention score before being aggregated as molecule representation and two distinct pooling architectures, namely centralized and distributed architectures were implemented and compared on benchmark datasets. In addition, in the course of processing protein sequences, inspired by the approach of protein feature extraction in GDGRU-DTA, we continue to interpret protein sequences as time series and extract their features using Bidirectional Long Short-Term Memory (BiLSTM) networks, since the context-dependence of long amino acid sequences. Similarly, DSAGLSTM-DTA also utilized a self-attention mechanism in the process of protein feature extraction to obtain comprehensive representations of proteins, in which the final hidden states for element in the batch were weighted with the each unit output of LSTM, and the results were represented as the final feature of proteins. Eventually, representations of drug and protein were concatenated and fed into prediction block for final prediction. The proposed model was evaluated on different regression datasets and binary classification datasets, and the results demonstrated that DSAGLSTM-DTA was superior to some state-ofthe-art DTA models and exhibited good generalization ability.
DSAGLSTM-DTA: PREDICTION OF DRUG-TARGET AFFINITY USING DUAL SELF-ATTENTION AN...mlaij
The research on affinity between drugs and targets (DTA) aims to effectively narrow the target search
space for drug repurposing. Therefore, reasonable prediction of drug and target affinities can minimize the
waste of resources such as human and material resources. In this work, a novel graph-based model called
DSAGLSTM-DTA was proposed for DTA prediction. The proposed model is unlike previous graph-based
drug-target affinity model, which incorporated self-attention mechanisms in the feature extraction process
of drug molecular graphs to fully extract its effective feature representations. The features of each atom in
the 2D molecular graph were weighted based on attention score before being aggregated as molecule
representation and two distinct pooling architectures, namely centralized and distributed architectures
were implemented and compared on benchmark datasets. In addition, in the course of processing protein
sequences, inspired by the approach of protein feature extraction in GDGRU-DTA, we continue to
interpret protein sequences as time series and extract their features using Bidirectional Long Short-Term
Memory (BiLSTM) networks, since the context-dependence of long amino acid sequences. Similarly,
DSAGLSTM-DTA also utilized a self-attention mechanism in the process of protein feature extraction to
obtain comprehensive representations of proteins, in which the final hidden states for element in the batch
were weighted with the each unit output of LSTM, and the results were represented as the final feature of
proteins. Eventually, representations of drug and protein were concatenated and fed into prediction block
for final prediction. The proposed model was evaluated on different regression datasets and binary
classification datasets, and the results demonstrated that DSAGLSTM-DTA was superior to some state-ofthe-art DTA models and exhibited good generalization ability.
An efficient feature selection algorithm for health care data analysisjournalBEEI
Diabete is a silent killer, which will slowly kill the person if it goes undetected. The existing system which uses F-score method and K-means clustering of checking whether a person has diabetes or not are 100% accurate, and anything which isn't a 100% is not acceptable in the medical field, as it could cost the lives of many people. Our proposed system aims at using some of the best features of the existing algorithms to predict diabetes, and combine these and based on these features; This research work turns them into a novel algorithm, which will be 100% accurate in its prediction. With the surge in technological advancements, we can use data mining to predict when a person would be diagnosed with diabetes. Specifically, we analyze the best features of chi-square algorithm and advanced clustering algorithm (ACA). This research work is done using the Pima Indian Diabetes dataset provided by National Institutes of Diabetes and Digestive and Kidney Diseases. Using classification theorems and methods we can consider different factors like age, BMI, blood pressure and the importance given to these attributes overall, and singles these attributes out, and use them for the prediction of diabetes.
Cadd and molecular modeling for M.PharmShikha Popali
THE CADD IS FOR THE DRUG DEVELOPMENT THE DIFFERENT STRATEGIES ARE MENTIONED LIKE QSAR MOLECULAR DOCKING, THE DIFFERENT DIMNSIONAL FORMS OF QSAR , THE ADVANCE SAR of it.
The document discusses collaborative drug discovery efforts for neglected diseases like tuberculosis. It describes how the Collaborative Drug Discovery (CDD) platform facilitates open data sharing and enables building predictive models across public and private datasets from multiple organizations. CDD has supported over 20 labs working on tuberculosis through cheminformatics analysis of large compound libraries and building Bayesian classification models to prioritize compounds for testing.
1) The document discusses various medical image fusion techniques including pixel level, feature level, and decision level fusion.
2) It proposes a novel pixel level fusion method called Iterative Block Level Principal Component Averaging fusion that divides images into blocks and calculates principal components for each block.
3) Experimental results on fusing noise free and noise filtered MR images show that the proposed method performs well in terms of average mutual information and structural similarity compared to other algorithms.
Talk at Yale University April 26th 2011: Applying Computational Modelsfor To...Sean Ekins
The document discusses applying computational models to problems in toxicology, drug discovery, and beyond. It summarizes recent work using machine learning models and other in silico techniques to predict drug-induced liver injury (DILI) and interactions with transporters like hOCTN2. Models were able to classify compounds as DILI-positive or negative with over 75% accuracy when tested on external datasets. The techniques discussed could help prioritize compounds for further testing and filter libraries to avoid reactive or toxic features.