This document presents DockQ, a new continuous quality measure for protein-protein docking models. DockQ combines three existing measures - fraction of native contacts (Fnat), ligand root mean square deviation (LRMS), and interface root mean square deviation (iRMS) - into a single score between 0 and 1. DockQ was derived and optimized using a large dataset of docking models. When applied to independent test sets, DockQ was able to accurately recapitulate the original four-class CAPRI quality assessment of docking models. DockQ therefore provides a higher resolution assessment of model quality compared to the discrete CAPRI classes alone.
Predicting Value of Binding Constants of Organic Ligands to Beta-Cyclodextrin...Maciej Przybyłek
The quantitative structure–activity relationship (QSPR) model was formulated to quantify values of the binding constant (lnK) of a series of ligands to beta–cyclodextrin (β-CD). For this purpose, the multivariate adaptive regression splines (MARSplines) methodology was adopted with molecular descriptors derived from the simplified molecular input line entry specification (SMILES) strings. This approach allows discovery of regression equations consisting of new non-linear components (basis functions) being combinations of molecular descriptors. The model was subjected to the standard internal and external validation procedures, which indicated its high predictive power. The appearance of polarity-related descriptors, such as XlogP, confirms the hydrophobic nature of the cyclodextrin cavity. The model can be used for predicting the affinity of new ligands to β-CD. However, a non-standard application was also proposed for classification into Biopharmaceutical Classification System (BCS) drug types. It was found that a single parameter, which is the estimated value of lnK, is sufficient to distinguish highly permeable drugs (BCS class I and II) from low permeable ones (BCS class II and IV). In general, it was found that drugs of the former group exhibit higher affinity to β-CD then the latter group (class III and IV).
This document provides an overview of quantitative structure-property relationship (QSPR) modeling for drug disposition prediction. It discusses why QSPRs are useful, the general methodology used in QSPR modeling including descriptor generation, statistical analysis and model validation. Specific approaches covered include multiple linear regression, partial least squares, artificial neural networks and internal/external validation techniques. The overall goal of a QSPR approach is to mathematically relate molecular descriptors to physicochemical properties and pharmacokinetic parameters to allow for drug property prediction without additional experiments.
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
The papers for publication in The International Journal of Engineering& Science are selected through rigorous peer reviews to ensure originality, timeliness, relevance, and readability.
1) The document presents ProQDock, a scoring function that predicts the quality of protein-protein docking models. ProQDock uses machine learning techniques, specifically support vector machines, trained on various structural features to predict an absolute quality score for docking models called DockQ.
2) When tested on independent datasets, ProQDock performed better than existing state-of-the-art scoring functions at ranking docking models and identifying correct models.
3) The document describes the training and testing of ProQDock, including the datasets used for training and evaluation, the target quality score (DockQ), feature selection, machine learning methods, and evaluation on independent test sets.
The document describes the complementarity plot (CP), a validation tool for protein structures based on packing and electrostatics of buried residues. The CP plots surface complementarity against electrostatic complementarity for buried residues. The document outlines how local and global scores are designed based on the CP to detect various errors, such as incorrect side chain orientations, diffuse main chain errors, and imbalanced charges. Validation results show the CP is effective at discriminating obsolete structures from updated ones and identifying other errors. Applications of the CP in protein modeling and design are also demonstrated.
GPCODON ALIGNMENT: A GLOBAL PAIRWISE CODON BASED SEQUENCE ALIGNMENT APPROACHijdms
The alignment of two DNA sequences is a basic step in the analysis of biological data. Sequencing a long
DNA sequence is one of the most interesting problems in bioinformatics. Several techniques have been
developed to solve this sequence alignment problem like dynamic programming and heuristic algorithms.
In this paper, we introduce (GPCodon alignment) a pairwise DNA-DNA method for global sequence
alignment that improves the accuracy of pairwise sequence alignment. We use a new scoring matrix to
produce the final alignment called the empirical codon substitution matrix. Using this matrix in our
technique enabled the discovery of new relationships between sequences that could not be discovered using
traditional matrices. In addition, we present experimental results that show the performance of the
proposed technique over eleven datasets of average length of 2967 bps. We compared the efficiency and
accuracy of our techniques against a comparable tool called “Pairwise Align Codons” [1].
PROGRAM PHASE IN LIGAND-BASED PHARMACOPHORE MODEL GENERATION AND 3D DATABASE ...Simone Brogi
We have applied a novel approach to generate a ligand-based pharmacophore model. The pharmacophore was built from a set of 42 compounds showing activity against MCF-7 cell line derived from human mammary adenocarcinoma, using the program PHASE, implemented in the Schrödinger suite software package. PHASE is a highly flexible system for common pharmacophore identification and assessment and 3D-database creation and searching. The best pharmacophore hypothesis showed five features: two hydrogen-bond acceptors, one hydrogen-bond donor, and two aromatic rings. The structure–activity relationship (SAR) so acquired was applied within PHASE for molecular alignment in a comparative molecular field analysis (CoMFA) 3D-QSAR study. The 3D-QSAR model yielded a test set r2 equal to 0.97 and demonstrated to be highly predictive with respect to an external test set of 18 compounds (r2 =0.93). In summary, in this study we improved a previously developed Catalyst MCF-7 inhibitory pharmacophore, and established a predictive 3D-QSAR model. We have further used this model to detect novel MCF-7 cell line inhibitors through 3D database searching
Predicting Value of Binding Constants of Organic Ligands to Beta-Cyclodextrin...Maciej Przybyłek
The quantitative structure–activity relationship (QSPR) model was formulated to quantify values of the binding constant (lnK) of a series of ligands to beta–cyclodextrin (β-CD). For this purpose, the multivariate adaptive regression splines (MARSplines) methodology was adopted with molecular descriptors derived from the simplified molecular input line entry specification (SMILES) strings. This approach allows discovery of regression equations consisting of new non-linear components (basis functions) being combinations of molecular descriptors. The model was subjected to the standard internal and external validation procedures, which indicated its high predictive power. The appearance of polarity-related descriptors, such as XlogP, confirms the hydrophobic nature of the cyclodextrin cavity. The model can be used for predicting the affinity of new ligands to β-CD. However, a non-standard application was also proposed for classification into Biopharmaceutical Classification System (BCS) drug types. It was found that a single parameter, which is the estimated value of lnK, is sufficient to distinguish highly permeable drugs (BCS class I and II) from low permeable ones (BCS class II and IV). In general, it was found that drugs of the former group exhibit higher affinity to β-CD then the latter group (class III and IV).
This document provides an overview of quantitative structure-property relationship (QSPR) modeling for drug disposition prediction. It discusses why QSPRs are useful, the general methodology used in QSPR modeling including descriptor generation, statistical analysis and model validation. Specific approaches covered include multiple linear regression, partial least squares, artificial neural networks and internal/external validation techniques. The overall goal of a QSPR approach is to mathematically relate molecular descriptors to physicochemical properties and pharmacokinetic parameters to allow for drug property prediction without additional experiments.
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
The papers for publication in The International Journal of Engineering& Science are selected through rigorous peer reviews to ensure originality, timeliness, relevance, and readability.
1) The document presents ProQDock, a scoring function that predicts the quality of protein-protein docking models. ProQDock uses machine learning techniques, specifically support vector machines, trained on various structural features to predict an absolute quality score for docking models called DockQ.
2) When tested on independent datasets, ProQDock performed better than existing state-of-the-art scoring functions at ranking docking models and identifying correct models.
3) The document describes the training and testing of ProQDock, including the datasets used for training and evaluation, the target quality score (DockQ), feature selection, machine learning methods, and evaluation on independent test sets.
The document describes the complementarity plot (CP), a validation tool for protein structures based on packing and electrostatics of buried residues. The CP plots surface complementarity against electrostatic complementarity for buried residues. The document outlines how local and global scores are designed based on the CP to detect various errors, such as incorrect side chain orientations, diffuse main chain errors, and imbalanced charges. Validation results show the CP is effective at discriminating obsolete structures from updated ones and identifying other errors. Applications of the CP in protein modeling and design are also demonstrated.
GPCODON ALIGNMENT: A GLOBAL PAIRWISE CODON BASED SEQUENCE ALIGNMENT APPROACHijdms
The alignment of two DNA sequences is a basic step in the analysis of biological data. Sequencing a long
DNA sequence is one of the most interesting problems in bioinformatics. Several techniques have been
developed to solve this sequence alignment problem like dynamic programming and heuristic algorithms.
In this paper, we introduce (GPCodon alignment) a pairwise DNA-DNA method for global sequence
alignment that improves the accuracy of pairwise sequence alignment. We use a new scoring matrix to
produce the final alignment called the empirical codon substitution matrix. Using this matrix in our
technique enabled the discovery of new relationships between sequences that could not be discovered using
traditional matrices. In addition, we present experimental results that show the performance of the
proposed technique over eleven datasets of average length of 2967 bps. We compared the efficiency and
accuracy of our techniques against a comparable tool called “Pairwise Align Codons” [1].
PROGRAM PHASE IN LIGAND-BASED PHARMACOPHORE MODEL GENERATION AND 3D DATABASE ...Simone Brogi
We have applied a novel approach to generate a ligand-based pharmacophore model. The pharmacophore was built from a set of 42 compounds showing activity against MCF-7 cell line derived from human mammary adenocarcinoma, using the program PHASE, implemented in the Schrödinger suite software package. PHASE is a highly flexible system for common pharmacophore identification and assessment and 3D-database creation and searching. The best pharmacophore hypothesis showed five features: two hydrogen-bond acceptors, one hydrogen-bond donor, and two aromatic rings. The structure–activity relationship (SAR) so acquired was applied within PHASE for molecular alignment in a comparative molecular field analysis (CoMFA) 3D-QSAR study. The 3D-QSAR model yielded a test set r2 equal to 0.97 and demonstrated to be highly predictive with respect to an external test set of 18 compounds (r2 =0.93). In summary, in this study we improved a previously developed Catalyst MCF-7 inhibitory pharmacophore, and established a predictive 3D-QSAR model. We have further used this model to detect novel MCF-7 cell line inhibitors through 3D database searching
Computer simulation in pharmacokinetics and pharmacodynamicsMOHAMMAD ASIM
Computer simulation can be used at four levels in pharmacokinetics and pharmacodynamics: (1) whole organism level using lumped-parameter or physiologically-based pharmacokinetic models, (2) isolated tissue and organ level using distributed blood tissue exchange models, (3) cellular level modeling intracellular and membrane processes, and (4) protein and gene level including computational protein design and models of conditions like HIV viral load.
This document summarizes research applying deep learning techniques to predict epigenomic enhancer regions from DNA sequences. Four models - variations of Basset, DeepSea, DanQ, and a custom Greenside-Basset model - were trained on a dataset from the NIH Roadmap Epigenomics Mapping Consortium to label 1000 base pair sequences as active or inactive for 57 cell types. The Basset variation achieved the best balanced accuracy of 72.8% and had the highest precision and F1 scores, though it still weighted negative examples heavily, as precision was not strong. Experimenting with different embeddings, loss functions, and architectures helped narrow the models best suited for this sequence labeling task.
This document discusses the development of a probabilistic network-level pavement management system using Markov processes. Pavements are grouped into families with similar characteristics. Markov transition probability matrices are developed for each family to model the uncertain deterioration of pavements over time. The Markov models are integrated with dynamic programming and prioritization programs to determine optimal maintenance and rehabilitation recommendations under budget constraints and provide the highest network performance. The developed system is applied to an airport pavement network as an example.
3D QSAR approaches relate the biological activity of compounds to their 3D structural properties using statistical analysis. CoMFA is a commonly used 3D-QSAR method that involves aligning molecules, placing them in a grid, calculating electrostatic and steric field properties at each point, and correlating these descriptors to biological activity using PLS analysis. CoMFA results are often displayed as contour plots that identify regions where certain molecular properties increase or decrease activity. X-ray crystallography and NMR spectroscopy can provide experimental data on bioactive conformations.
1) The document compares the accuracy of empirical (HOSE code, neural network) and quantum-mechanical (QM) methods for predicting 13C NMR chemical shifts.
2) It analyzes 205 molecules where experimental and QM-calculated 13C shifts were published, and calculates shifts using HOSE code, neural network, and QM methods.
3) The mean absolute errors (MAE) were 1.58 ppm for HOSE code, 1.91 ppm for neural network, and 3.29 ppm for QM methods, indicating that the empirical methods provided more accurate predictions for this data set on average.
This document provides an overview of quantitative structure-activity relationship (QSAR) modeling approaches. It discusses various 3D-QSAR methods including contour map analysis, statistical methods like linear regression, principal components analysis, and pattern recognition techniques like cluster analysis and artificial neural networks. The importance of statistical parameters for evaluating and selecting the best QSAR model is also highlighted. In summary, the document outlines different 3D-QSAR modeling techniques, statistical analyses used in QSAR studies, and how statistics help in model selection and evaluation.
Computer simulations in pharmacokinetics and pharmacodynamicsGOKULAKRISHNAN S
This document discusses computer simulations in pharmacokinetics and pharmacodynamics. It describes how whole organism, isolated tissue, and organ simulations work. For whole organism simulations, two approaches are used: lumped-parameter PK-PD modeling which uses differential equations to model the system over time, and physiological modeling which attempts to model interacting organs in detail. Isolated tissue and organ simulations are also discussed, focusing on models of the heart, liver, kidney and brain. The challenges of complexity and model selection are addressed.
Computational modelling of drug disposition lalitajoshi9
computational modelling of drug disposition is the integral part of computer aided drug design. different kinds of tools being used in the prediction of drug disposition in human body. This topic in the CADD explains the details about the drug disposition, active transporters and tools.
Review on Computational Bioinformatics and Molecular Modelling Novel Tool for...ijtsrd
Advancement in science and technology has brought a remarkable change in the field of drug discovery. Earlier it was very difficult to predict the target for receptor but nowadays, it is easy and robust task to dock the target protein with ligand and binding affinity is calculated. Docking helps in the virtual screening of drug along with its hit identification. There are two approaches through which docking can be carried out, shape complementary and stimulation approach. There are many procedures involved in carrying out docking and all require different software's and algorithms. Molecular docking serves as a good platform to screen a large number of ligands and is useful in Drug-DNA studies. This review mainly focuses on the general idea of molecular docking and discusses its major applications, different types of interaction involved and types of docking. Rishabh Jain "Review on Computational Bioinformatics and Molecular Modelling: Novel Tool for Drug Discovery" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-1 , December 2018, URL: http://www.ijtsrd.com/papers/ijtsrd18914.pdf
http://www.ijtsrd.com/pharmacy/pharmacoinformatics/18914/review-on-computational-bioinformatics-and-molecular-modelling-novel-tool-for-drug-discovery/rishabh-jain
This document discusses quantitative structure-activity relationships (QSAR) modeling techniques. It introduces 2D-QSAR which uses molecular descriptors to correlate structure and activity. It also discusses 3D-QSAR techniques like CoMFA and CoMSIA which use 3D molecular fields/properties and statistical methods like PLS to model activity. These techniques are useful for drug design, virtual screening, and predicting absorption, distribution, metabolism, excretion properties.
siRNA has become an indispensible tool for silencing gene expression. It can act as an antiviral agent in RNAi pathway against plant diseases caused by plant viruses. However, identification of appropriate features for effective siRNA design has become a pressing issue for researchers which need to be resolved. Feature selection is a vital pre-processing technique involved in bioinformatics data set to find the most discriminative information not only for dimensionality reduction and detection of relevance features but also for minimizing the cost associated with features to design an accurate learning system. In this paper, we propose an ANN based feature selection approach using hybrid GA-PSO for selecting feature subset by discarding the irrelevant features and evaluating the cost of the model training. The results showed that the performance of proposed hybrid GA-PSO model outperformed the results of general PSO.a
Interaction fingerprint: 1D representation of 3D protein-ligand complexesVladimir Chupakhin
Structural interaction fingerprint1 (IFP) was introduced in order to overcome the shortcomings of the existing scoring functions. IFP represent a binary string that encoding a presence or an absence of interactions of a ligand with amino acids of a protein binding site. It is a convenient way to compare and analyze binding poses of the ligands.
This document discusses molecular descriptors, which are mathematical representations of molecular properties generated by algorithms. It covers the types of descriptors, including 1D, 2D and 3D descriptors. 1D descriptors represent information from molecular formulas, while 2D descriptors capture size, shape and electronic distribution. 3D descriptors describe properties related to 3D conformation. Tools discussed for calculating descriptors include commercial software as well as open-source options like CDK and PaDEL. The utility of descriptors in applications like QSAR modeling is also mentioned.
3D QSAR techniques like CoMFA and CoMSIA can quantitatively correlate the biological activity of a series of compounds to their 3D molecular properties. CoMFA generates 3D interaction fields around aligned molecules using steric and electrostatic probes, while CoMSIA additionally considers hydrophobic and hydrogen bonding interactions. The document discusses these techniques and provides an example case study applying CoMFA to develop a QSAR model for human eosinophil phosphodiesterase inhibitors with a cross-validated R2 of 0.565. In conclusion, 3D QSAR is a valuable tool for understanding structure-activity relationships and aiding drug design and discovery efforts.
Lecture 11 developing qsar, evaluation of qsar model and virtual screeningRAJAN ROLTA
This document discusses the development and evaluation of quantitative structure-activity relationship (QSAR) models. It begins with a brief history of QSAR, including Hansch's early equation relating biological activity to hydrophobicity and electronic characteristics. Methods for calculating properties like log P partitioning are described. Validation of QSAR models involves statistical measures of goodness-of-fit and model stability as well as external validation. Applications of QSAR include drug design, toxicity prediction, and property prediction. Virtual screening uses computational methods to identify potential hit compounds from large databases.
English Tuition | Secondary English TutorDeepak Raseo
You want a experienced and competent english language tutor for your child because you think that a proper tutor and his guidance is a main key for the success of your child in english subject. For more details visit us : http://eduedge-tuition.com/
Transformation module 2 detailing behaviour 4 feb 16Ghazali Md. Noor
This document provides an overview of strategic planning and cultural transformation tools. It discusses the strategic planning process, including defining a strategic destination, identifying key themes, selecting priority initiatives, and establishing key performance indicators. It also summarizes Abraham Maslow's hierarchy of needs and Richard Barrett's levels of consciousness, which inform the cultural transformation process. The document outlines the stages in developing personal and organizational consciousness from a focus on survival and security to service, collaboration, and making a positive impact.
The document discusses Newton's universal law of gravitation. It explains that gravity produces an attractive force between bodies that depends on their masses and the distance between their centers. It also describes how orbital motion occurs when the speed of an object is great enough that the curvature of the Earth causes the object to fall into orbit rather than hitting the surface. Additionally, it discusses how astronauts can jump higher on the moon due to its weaker gravitational pull compared to Earth.
The document is a presentation for small business owners on key relationships, accounting, and growing a business. It discusses establishing important relationships with lawyers, bankers, accountants, and others. It emphasizes the importance of owning accounting software like QuickBooks and hiring a legitimate accountant. The presentation also covers key business documents needed for partnerships and recommendations for growing a business through local chambers, networking groups, and marketing professionals.
O documento fornece informações sobre massa atômica e massa molecular. Em três frases:
1) A massa atômica indica quantas vezes um átomo é mais pesado que 1/12 do carbono-12 e é usada para calcular a massa molecular.
2) A massa molecular soma os pesos atômicos dos átomos de uma molécula e indica quantas vezes uma molécula é mais pesada que 1/12 do carbono-12.
3) O número de Avogadro (6,02 x 1023) relaciona a massa em gra
This document discusses four strategies that astronomers use to deal with vast arrays of numbers:
1. The metric system which uses factors of 10 to relate units of measurement.
2. Scientific notation which combines powers of 10 with a number's value to efficiently write very large or small numbers.
3. Special units designed for astronomical contexts like light years and parsecs for distance and the sun's luminosity for energy output.
4. Approximation which involves limiting numbers of significant digits and rounding values to focus on order-of-magnitude estimates.
Computer simulation in pharmacokinetics and pharmacodynamicsMOHAMMAD ASIM
Computer simulation can be used at four levels in pharmacokinetics and pharmacodynamics: (1) whole organism level using lumped-parameter or physiologically-based pharmacokinetic models, (2) isolated tissue and organ level using distributed blood tissue exchange models, (3) cellular level modeling intracellular and membrane processes, and (4) protein and gene level including computational protein design and models of conditions like HIV viral load.
This document summarizes research applying deep learning techniques to predict epigenomic enhancer regions from DNA sequences. Four models - variations of Basset, DeepSea, DanQ, and a custom Greenside-Basset model - were trained on a dataset from the NIH Roadmap Epigenomics Mapping Consortium to label 1000 base pair sequences as active or inactive for 57 cell types. The Basset variation achieved the best balanced accuracy of 72.8% and had the highest precision and F1 scores, though it still weighted negative examples heavily, as precision was not strong. Experimenting with different embeddings, loss functions, and architectures helped narrow the models best suited for this sequence labeling task.
This document discusses the development of a probabilistic network-level pavement management system using Markov processes. Pavements are grouped into families with similar characteristics. Markov transition probability matrices are developed for each family to model the uncertain deterioration of pavements over time. The Markov models are integrated with dynamic programming and prioritization programs to determine optimal maintenance and rehabilitation recommendations under budget constraints and provide the highest network performance. The developed system is applied to an airport pavement network as an example.
3D QSAR approaches relate the biological activity of compounds to their 3D structural properties using statistical analysis. CoMFA is a commonly used 3D-QSAR method that involves aligning molecules, placing them in a grid, calculating electrostatic and steric field properties at each point, and correlating these descriptors to biological activity using PLS analysis. CoMFA results are often displayed as contour plots that identify regions where certain molecular properties increase or decrease activity. X-ray crystallography and NMR spectroscopy can provide experimental data on bioactive conformations.
1) The document compares the accuracy of empirical (HOSE code, neural network) and quantum-mechanical (QM) methods for predicting 13C NMR chemical shifts.
2) It analyzes 205 molecules where experimental and QM-calculated 13C shifts were published, and calculates shifts using HOSE code, neural network, and QM methods.
3) The mean absolute errors (MAE) were 1.58 ppm for HOSE code, 1.91 ppm for neural network, and 3.29 ppm for QM methods, indicating that the empirical methods provided more accurate predictions for this data set on average.
This document provides an overview of quantitative structure-activity relationship (QSAR) modeling approaches. It discusses various 3D-QSAR methods including contour map analysis, statistical methods like linear regression, principal components analysis, and pattern recognition techniques like cluster analysis and artificial neural networks. The importance of statistical parameters for evaluating and selecting the best QSAR model is also highlighted. In summary, the document outlines different 3D-QSAR modeling techniques, statistical analyses used in QSAR studies, and how statistics help in model selection and evaluation.
Computer simulations in pharmacokinetics and pharmacodynamicsGOKULAKRISHNAN S
This document discusses computer simulations in pharmacokinetics and pharmacodynamics. It describes how whole organism, isolated tissue, and organ simulations work. For whole organism simulations, two approaches are used: lumped-parameter PK-PD modeling which uses differential equations to model the system over time, and physiological modeling which attempts to model interacting organs in detail. Isolated tissue and organ simulations are also discussed, focusing on models of the heart, liver, kidney and brain. The challenges of complexity and model selection are addressed.
Computational modelling of drug disposition lalitajoshi9
computational modelling of drug disposition is the integral part of computer aided drug design. different kinds of tools being used in the prediction of drug disposition in human body. This topic in the CADD explains the details about the drug disposition, active transporters and tools.
Review on Computational Bioinformatics and Molecular Modelling Novel Tool for...ijtsrd
Advancement in science and technology has brought a remarkable change in the field of drug discovery. Earlier it was very difficult to predict the target for receptor but nowadays, it is easy and robust task to dock the target protein with ligand and binding affinity is calculated. Docking helps in the virtual screening of drug along with its hit identification. There are two approaches through which docking can be carried out, shape complementary and stimulation approach. There are many procedures involved in carrying out docking and all require different software's and algorithms. Molecular docking serves as a good platform to screen a large number of ligands and is useful in Drug-DNA studies. This review mainly focuses on the general idea of molecular docking and discusses its major applications, different types of interaction involved and types of docking. Rishabh Jain "Review on Computational Bioinformatics and Molecular Modelling: Novel Tool for Drug Discovery" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-1 , December 2018, URL: http://www.ijtsrd.com/papers/ijtsrd18914.pdf
http://www.ijtsrd.com/pharmacy/pharmacoinformatics/18914/review-on-computational-bioinformatics-and-molecular-modelling-novel-tool-for-drug-discovery/rishabh-jain
This document discusses quantitative structure-activity relationships (QSAR) modeling techniques. It introduces 2D-QSAR which uses molecular descriptors to correlate structure and activity. It also discusses 3D-QSAR techniques like CoMFA and CoMSIA which use 3D molecular fields/properties and statistical methods like PLS to model activity. These techniques are useful for drug design, virtual screening, and predicting absorption, distribution, metabolism, excretion properties.
siRNA has become an indispensible tool for silencing gene expression. It can act as an antiviral agent in RNAi pathway against plant diseases caused by plant viruses. However, identification of appropriate features for effective siRNA design has become a pressing issue for researchers which need to be resolved. Feature selection is a vital pre-processing technique involved in bioinformatics data set to find the most discriminative information not only for dimensionality reduction and detection of relevance features but also for minimizing the cost associated with features to design an accurate learning system. In this paper, we propose an ANN based feature selection approach using hybrid GA-PSO for selecting feature subset by discarding the irrelevant features and evaluating the cost of the model training. The results showed that the performance of proposed hybrid GA-PSO model outperformed the results of general PSO.a
Interaction fingerprint: 1D representation of 3D protein-ligand complexesVladimir Chupakhin
Structural interaction fingerprint1 (IFP) was introduced in order to overcome the shortcomings of the existing scoring functions. IFP represent a binary string that encoding a presence or an absence of interactions of a ligand with amino acids of a protein binding site. It is a convenient way to compare and analyze binding poses of the ligands.
This document discusses molecular descriptors, which are mathematical representations of molecular properties generated by algorithms. It covers the types of descriptors, including 1D, 2D and 3D descriptors. 1D descriptors represent information from molecular formulas, while 2D descriptors capture size, shape and electronic distribution. 3D descriptors describe properties related to 3D conformation. Tools discussed for calculating descriptors include commercial software as well as open-source options like CDK and PaDEL. The utility of descriptors in applications like QSAR modeling is also mentioned.
3D QSAR techniques like CoMFA and CoMSIA can quantitatively correlate the biological activity of a series of compounds to their 3D molecular properties. CoMFA generates 3D interaction fields around aligned molecules using steric and electrostatic probes, while CoMSIA additionally considers hydrophobic and hydrogen bonding interactions. The document discusses these techniques and provides an example case study applying CoMFA to develop a QSAR model for human eosinophil phosphodiesterase inhibitors with a cross-validated R2 of 0.565. In conclusion, 3D QSAR is a valuable tool for understanding structure-activity relationships and aiding drug design and discovery efforts.
Lecture 11 developing qsar, evaluation of qsar model and virtual screeningRAJAN ROLTA
This document discusses the development and evaluation of quantitative structure-activity relationship (QSAR) models. It begins with a brief history of QSAR, including Hansch's early equation relating biological activity to hydrophobicity and electronic characteristics. Methods for calculating properties like log P partitioning are described. Validation of QSAR models involves statistical measures of goodness-of-fit and model stability as well as external validation. Applications of QSAR include drug design, toxicity prediction, and property prediction. Virtual screening uses computational methods to identify potential hit compounds from large databases.
English Tuition | Secondary English TutorDeepak Raseo
You want a experienced and competent english language tutor for your child because you think that a proper tutor and his guidance is a main key for the success of your child in english subject. For more details visit us : http://eduedge-tuition.com/
Transformation module 2 detailing behaviour 4 feb 16Ghazali Md. Noor
This document provides an overview of strategic planning and cultural transformation tools. It discusses the strategic planning process, including defining a strategic destination, identifying key themes, selecting priority initiatives, and establishing key performance indicators. It also summarizes Abraham Maslow's hierarchy of needs and Richard Barrett's levels of consciousness, which inform the cultural transformation process. The document outlines the stages in developing personal and organizational consciousness from a focus on survival and security to service, collaboration, and making a positive impact.
The document discusses Newton's universal law of gravitation. It explains that gravity produces an attractive force between bodies that depends on their masses and the distance between their centers. It also describes how orbital motion occurs when the speed of an object is great enough that the curvature of the Earth causes the object to fall into orbit rather than hitting the surface. Additionally, it discusses how astronauts can jump higher on the moon due to its weaker gravitational pull compared to Earth.
The document is a presentation for small business owners on key relationships, accounting, and growing a business. It discusses establishing important relationships with lawyers, bankers, accountants, and others. It emphasizes the importance of owning accounting software like QuickBooks and hiring a legitimate accountant. The presentation also covers key business documents needed for partnerships and recommendations for growing a business through local chambers, networking groups, and marketing professionals.
O documento fornece informações sobre massa atômica e massa molecular. Em três frases:
1) A massa atômica indica quantas vezes um átomo é mais pesado que 1/12 do carbono-12 e é usada para calcular a massa molecular.
2) A massa molecular soma os pesos atômicos dos átomos de uma molécula e indica quantas vezes uma molécula é mais pesada que 1/12 do carbono-12.
3) O número de Avogadro (6,02 x 1023) relaciona a massa em gra
This document discusses four strategies that astronomers use to deal with vast arrays of numbers:
1. The metric system which uses factors of 10 to relate units of measurement.
2. Scientific notation which combines powers of 10 with a number's value to efficiently write very large or small numbers.
3. Special units designed for astronomical contexts like light years and parsecs for distance and the sun's luminosity for energy output.
4. Approximation which involves limiting numbers of significant digits and rounding values to focus on order-of-magnitude estimates.
The document outlines principles of assessment including that assessment should be valid in measuring intended skills, reliable with clear processes, and transparent with accessible information. Assessment should also be inclusive, an integral part of program design relating to goals, and manage a reasonable workload. Both formative and summative assessment as well as timely feedback should be included, and staff development should cover assessment competencies.
Creacion de un videojuego (Jhoustin-AlexanderJhoustin12
Este documento describe los pasos para crear un videojuego, incluyendo definir una idea y objetivos, desarrollar un guión, programación, consideraciones de mercado, y distribución. Explica que primero se define una idea original y objetivos, luego se crea un guión detallando la historia y jugabilidad, después se programa el juego, y finalmente se considera el mercado meta y se distribuye el producto terminado.
This document provides tips for becoming a better photographer. It discusses learning your camera by reading the manual so you understand basic operations. It covers composition techniques like using the viewfinder for stabilization. The rule of thirds for image placement is explained, along with other composition tips. Shutter speed and aperture/exposure controls are discussed in relation to lighting conditions and subject matter. Other topics include depth of field, white balance, ISO, and factors to consider when purchasing a camera. The document emphasizes practicing these techniques and reading over the material to prepare for an assessment quiz.
This document describes a new recursive Monte Carlo simulation algorithm called the Sampled Path Set Algorithm (SPSA) for modeling complex k-out-of-n reliability systems. The SPSA uses a graph representation of a reliability block diagram and recursively searches the graph to determine system response based on the system state vector at each simulation iteration, allowing modeling of systems with general component failure and repair distributions and large numbers of components. Existing methods for analyzing such systems using tie/cut sets have limitations as the number of sets grows non-linearly with increased system complexity. The SPSA provides a more efficient alternative with linear growth in processing and memory requirements.
Comparative Protein Structure Modeling and itsApplicationsLynellBull52
Comparative Protein Structure Modeling and its
Applications to Drug Discovery
Matthew Jacobson
1
and Andrej Sali
1,2
1
Department of Pharmaceutical Chemistry, California Institute for
Quantitative Biomedical Research, Mission Bay Genentech Hall, 600 16th Street,
University of California, San Francisco, CA 94143-2240, USA
2
Department of Biopharmaceutial Sciences, California Institute for
Quantitative Biomedical Research, Mission Bay Genentech Hall, 600 16th Street,
University of California, San Francisco, CA 94143-2240, USA
Contents
1. Introduction 259
2. Fold assignment and sequence-structure alignment 261
3. Comparative model building 261
4. Loop modeling 262
5. Sidechain modeling 263
6. Comparative modeling by MODELLER 264
7. Physics-based approaches to comparative model construction and refinement 264
8. Accuracy of comparative models 266
9. Modeling on a genomic scale 266
10. Applications of comparative modeling to drug discovery 267
10.1. Comparative models vs experimental structures in virtual screening 267
10.2. Use of comparative models to obtain novel drug leads 268
10.3. Comparative models of kinases in virtual screening 269
10.4. GPCR comparative models for drug development 270
10.5. Other uses of comparative models in drug development 271
10.6. Future directions 272
11. Conclusions 273
References 273
1. INTRODUCTION
Homology or comparative protein structure modeling constructs a three-dimensional
model of a given protein sequence based on its similarity to one or more known
structures. In this perspective, we begin by describing the comparative modeling
technique and the accuracy of the models. We then discuss the significant role that
comparative prediction plays in drug discovery. We focus on virtual ligand screening
against comparative models and illustrate the state-of-the-art by a number of specific
examples.
The genome sequencing efforts are providing us with complete genetic blueprints for
hundreds of organisms, including humans. We are now faced with describing,
ANNUAL REPORTS IN MEDICINAL CHEMISTRY, VOLUME 39 q 2004 Elsevier Inc.
ISSN: 0065-7743 DOI 10.1016/S0065-7743(04)39020-2 All rights reserved
controlling, and modifying the functions of proteins encoded by these genomes. This
task is generally facilitated by protein three-dimensional structures [1], which are best
determined by experimental methods such as X-ray crystallography and nuclear
magnetic resonance (NMR) spectroscopy. Despite significant advances in these
techniques, many protein sequences are not easily accessible to structure determination
by experiment. Over the last two years, the number of sequences in the comprehensive
public sequence databases, such as SwissProt/TrEMBL [2] and GenPept [3], increased
by a factor of 2.3 from 522,959 to 1,215,803 on 26 April 2004. In contrast, despite
structural genomics, the number of experimentally determined structures deposited in
the Protein Data Bank (PDB) increas ...
Csit65111ASSOCIATIVE REGRESSIVE DECISION RULE MINING FOR ASSOCIATIVE REGRESSI...cscpconf
Opinion mining also known as sentiment analysis, involves customer satisfactory patterns, sentiments and attitudes toward entities, products, services and their attributes. With the rapid development in the field of Internet, potential customer’s provides a satisfactory level of product/service reviews. The high volume of customer reviews were developed for product/review through taxonomy-aware processing but, it was difficult to identify the best reviews. In this paper, an Associative Regression Decision Rule Mining (ARDRM) technique is developed to predict the pattern for service provider and to improve customer satisfaction based on the review comments. Associative Regression based Decision Rule Mining performs twosteps for improving the customer satisfactory level. Initially, the Machine Learning Bayes Sentiment Classifier (MLBSC) is used to classify the class labels for each service reviews. After that, Regressive factor of the opinion words and Class labels were checked for Association between the words by using various probabilistic rules. Based on the probabilistic rules, the opinion and sentiments effect on customer reviews, are analyzed to arrive at specific set of service preferred by the customers with their review comments. The Associative Regressive Decision Rule helps the service provider to take decision on improving the customer satisfactory level. The experimental results reveal that the Associative Regression Decision Rule Mining (ARDRM) technique improved the performance in terms of true positive rate, Associative Regression factor, Regressive Decision Rule Generation time and Review Detection Accuracy of similar pattern.
Associative Regressive Decision Rule Mining for Predicting Customer Satisfact...csandit
Opinion mining also known as sentiment analysis, involves customer satisfactory patterns,
sentiments and attitudes toward entities, products, services and their attributes. With the rapid
development in the field of Internet, potential customer’s provides a satisfactory level of
product/service reviews. The high volume of customer reviews were developed for
product/review through taxonomy-aware processing but, it was difficult to identify the best
reviews. In this paper, an Associative Regression Decision Rule Mining (ARDRM) technique is
developed to predict the pattern for service provider and to improve customer satisfaction based
on the review comments. Associative Regression based Decision Rule Mining performs twosteps
for improving the customer satisfactory level. Initially, the Machine Learning Bayes
Sentiment Classifier (MLBSC) is used to classify the class labels for each service reviews. After
that, Regressive factor of the opinion words and Class labels were checked for Association
between the words by using various probabilistic rules. Based on the probabilistic rules, the
opinion and sentiments effect on customer reviews, are analyzed to arrive at specific set of
service preferred by the customers with their review comments. The Associative Regressive
Decision Rule helps the service provider to take decision on improving the customer satisfactory
level. The experimental results reveal that the Associative Regression Decision Rule Mining
(ARDRM) technique improved the performance in terms of true positive rate, Associative
Regression factor, Regressive Decision Rule Generation time and Review Detection Accuracy of
similar pattern.
A TWO-STAGE HYBRID MODEL BY USING ARTIFICIAL NEURAL NETWORKS AS FEATURE CONST...IJDKP
We propose a two-stage hybrid approach with neural networks as the new feature construction algorithms for bankcard response classifications. The hybrid model uses a very simpleneural network structure as the new feature construction tool in the firststage, thenthe newly created features are used asthe additional input variables in logistic regression in the second stage. The modelis compared with the traditional onestage model in credit customer response classification. It is observed that the proposed two-stage model outperforms the one-stage model in terms of accuracy, the area under ROC curve, andKS statistic. By creating new features with theneural network technique, the underlying nonlinear relationships between variables are identified. Furthermore, by using a verysimple neural network structure, the model could overcome the drawbacks of neural networks interms of its long training time, complex topology, and limited interpretability.
Threshold benchmarking for feature ranking techniquesjournalBEEI
In prediction modeling, the choice of features chosen from the original feature set is crucial for accuracy and model interpretability. Feature ranking techniques rank the features by its importance but there is no consensus on the number of features to be cut-off. Thus, it becomes important to identify a threshold value or range, so as to remove the redundant features. In this work, an empirical study is conducted for identification of the threshold benchmark for feature ranking algorithms. Experiments are conducted on Apache Click dataset with six popularly used ranker techniques and six machine learning techniques, to deduce a relationship between the total number of input features (N) to the threshold range. The area under the curve analysis shows that ≃ 33-50% of the features are necessary and sufficient to yield a reasonable performance measure, with a variance of 2%, in defect prediction models. Further, we also find that the log2(N) as the ranker threshold value represents the lower limit of the range.
The document proposes a Response Aware Probabilistic Matrix Factorization (RAPMF) framework to address limitations in existing collaborative filtering recommendation systems. RAPMF incorporates users' response patterns into probabilistic matrix factorization by modeling responses as a Bernoulli distribution for observed ratings and a step function for unobserved ratings. This allows marginalizing missing responses. The authors also develop a mini-batch implementation of RAPMF to reduce computational costs from O(N×M) to O(B2) for mini-batches of B users and items. Experimental evaluation on synthetic and real-world datasets demonstrates the merits of RAPMF, including improved performance and reduced training time compared to other methods.
This document describes an Excel spreadsheet and VBA macro that implements all-possible-subsets (APS) regression for model selection and predictor importance analysis. The spreadsheet can handle up to 20 predictor variables and reports the best regression model for each subset size based on measures like R-squared and Mallows' Cp. It also calculates a general dominance measure to determine the relative importance of each predictor variable. The spreadsheet has been used successfully in marketing courses to help students select the best regression models from customer satisfaction data and understand which predictor variables drive customer satisfaction the most.
A Threshold Fuzzy Entropy Based Feature Selection: Comparative StudyIJMER
Feature selection is one of the most common and critical tasks in database classification. It
reduces the computational cost by removing insignificant and unwanted features. Consequently, this
makes the diagnosis process accurate and comprehensible. This paper presents the measurement of
feature relevance based on fuzzy entropy, tested with Radial Basis Classifier (RBF) network,
Bagging(Bootstrap Aggregating), Boosting and stacking for various fields of datasets. Twenty
benchmarked datasets which are available in UCI Machine Learning Repository and KDD have been
used for this work. The accuracy obtained from these classification process shows that the proposed
method is capable of producing good and accurate results with fewer features than the original
datasets.
In order to provide sensing services to low-powered IoT devices, wireless sensor networks (WSNs) organize specialized transducers into networks. Energy usage is one of the most important design concerns in WSN because it is very hard to replace or recharge the batteries in sensor nodes. For an energy-constrained network, the clustering technique is crucial in preserving battery life. By strategically selecting a cluster head (CH), a network's load can be balanced, resulting in decreased energy usage and extended system life. Although clustering has been predominantly used in the literature, the concept of chain-based clustering has not yet been explored. As a result, in this paper, we employ a chain-based clustering architecture for data dissemination in the network. Furthermore, for CH selection, we employ the coati optimisation algorithm, which was recently proposed and has demonstrated significant improvement over other optimization algorithms. In this method, the parameters considered for selecting the CH are energy, node density, distance, and the network’s average energy. The simulation results show tremendous improvement over the competitive cluster-based routing algorithms in the context of network lifetime, stability period (first node dead), transmission rate, and the network's power reserves.
The document discusses computer-aided drug design (CADD) and molecular docking techniques. It begins by providing background on the emergence of CADD in the 1980s. It then describes how docking involves computationally sampling ligand conformations in a protein binding site to find an optimized orientation that minimizes free energy. Key components of docking simulations are representation of molecules, conformational searching, and scoring functions to rank results. Rigid docking considers only geometric complementarity while flexible docking accounts for some ligand and protein flexibility using different search algorithms.
HEART DISEASE PREDICTION USING MACHINE LEARNING TECHNIQUESIRJET Journal
This study compares various machine learning techniques for predicting cardiovascular disease using a dataset of over 3,600 records. Conventional models like AdaBoost Classifier and HistgradientBoosting Classifier performed best with accuracies of 91% and 89% respectively. When AUC-ROC is considered, HistgradientBoosting Classifier has the highest value of 0.96. Novel approaches like TabNet and Restricted Boltzmann Machine (RBM) are also explored, with TabNet achieving good performance. Overall, AdaBoost Classifier and HistgradientBoosting Classifier produced the most accurate results for cardiovascular disease prediction.
This document compares classification and regression models using the CARET package in R. Four classification algorithms are evaluated on Titanic survival data and three regression algorithms are evaluated on property liability data. For classification, random forests performed best based on the F-measure metric. For regression, gradient boosted models performed best based on RMSE. The document concludes classification can predict Titanic survivor characteristics while regression can predict property hazards.
Prediction of pIC50 Values for the Acetylcholinesterase (AChE) using QSAR ModelIRJET Journal
This document describes using a random forest regression model to predict pIC50 values, which indicate how strongly compounds bind to the acetylcholinesterase enzyme, for novel drugs that could treat Alzheimer's disease. The model is trained on a dataset of over 5,000 compounds from the ChEMBL database with known pIC50 values. Fingerprint descriptors of the compounds are used as predictors in the random forest model. The model is validated using statistical metrics and deployed using a web application to allow users to upload compounds and receive predicted pIC50 values.
A MODEL-BASED APPROACH MACHINE LEARNING TO SCALABLE PORTFOLIO SELECTIONIJCI JOURNAL
This study proposes a scalable asset selection and allocation approach using machine learning that
integrates clustering methods into portfolio optimization models. The methodology applies the Uniform
Manifold Approximation and Projection method and ensemble clustering techniques to preselect assets
from the Ibovespa and S&P 500 indices. The research compares three allocation models and finds that the
Hierarchical Risk Parity model outperformed the others, with a Sharpe ratio of 1.11. Despite the
pandemic's impact on the portfolios, with drawdowns close to 30%, they recovered in 111 to 149 trading
days. The portfolios outperformed the indices in cumulative returns, with similar annual volatilities of
20%. Preprocessing with UMAP allowed for finding clusters with higher discriminatory power, evaluated
through internal cluster validation metrics, helping to reduce the problem's size during optimal portfolio
allocation. Overall, this study highlights the potential of machine learning in portfolio optimization,
providing a useful framework for investment practitioners.
ieee projects download, base paper for ieee projects, ieee projects list, ieee projects titles, ieee projects for cse, ieee projects on networking,ieee projects 2012, ieee projects 2013, final year project, computer science final year projects, final year projects for information technology, ieee final year projects, final year students projects, students projects in java, students projects download, students projects in java with source code, students projects architecture, free ieee papers
Automatic customer review summarization using deep learningbased hybrid senti...IJECEIAES
Customer review summarization (CRS) offers business owners summarized customer feedback. The functionality of CRS mainly depends on the sentiment analysis (SA) model; hence it needs an efficient SA technique. The aim of this study is to construct an SA model employing deep learning for CRS (SADL-CRS) to present summarized data and assist businesses in understanding the behavior of their customers. The SA model employing deep learning (SADL) and CRS phases make up the proposed automatic SADL-CRS model. The SADL consists of review preprocessing, feature extraction, and sentiment classification. The preprocessing stage removes irrelevant text from the reviews using natural language processing (NLP) methods. The proposed hybrid approach combines review-related features and aspect-related features to efficiently extract the features and create a unique hybrid feature vector (HF) for each review. The classification of input reviews is performed using a deep learning (DL) classifier long shortterm memory (LSTM). The CRS phase performs the automatic summarization employing the outcome of SADL. The experimental evaluation of the proposed model is done using diverse research data sets. The SADL-CRS model attains the average recall, precision, and F1-score of 95.53%, 95.76%, and 95.06%, respectively. The review summarization efficiency of the suggested model is improved by 6.12% compared to underlying CRS methods.
A parsimonious SVM model selection criterion for classification of real-world ...o_almasi
This paper proposes and optimizes a two-term cost function consisting of a sparseness term and a generalized v-fold cross-validation term by a new adaptive particle swarm optimization (APSO). APSO updates its parameters adaptively based on a dynamic feedback from the success rate of the each particle’s personal best. Since the proposed cost function is based on the choosing fewer numbers of support vectors, the complexity of SVM models decreased while the accuracy remains in an acceptable range. Therefore, the testing time decreases and makes SVM more applicable for practical applications in real data sets. A comparative study on data sets of UCI database is performed between the proposed cost function and conventional cost function to demonstrate the effectiveness of the proposed cost function.
The document summarizes five papers that address challenges in context-aware recommendation systems using factorization methods. Three key challenges are high dimensionality, data sparsity, and cold starts. The papers propose various algorithms using matrix factorization and tensor factorization to address these challenges. COT models each context as an operation on user-item pairs to reduce dimensionality. Another approach extracts latent contexts from sensor data using deep learning and matrix factorization. CSLIM extends the SLIM algorithm to incorporate contextual ratings. TAPER uses tensor factorization to integrate various contexts for expert recommendations. Finally, GFF provides a generalized factorization framework to handle different recommendation models. The document analyzes how well each paper meets the challenges.
Design and Optimization of Supply Chain Network with Nonlinear Log-Space Mode...RSIS International
This proposal intends to address the practical vulnerabilities in the state-of-the-art models for optimal design of supply chain network. While the conventional models attempt to transform the design constraints of the supply chain network into sum of cost functions, the proposed model will transform the cost function into a nonlinear subspace. Moreover, the subspace will be optimized under a logarithmic scale and so the multiple network constraints such as stock transportation, inventory, echelon levels and backorders can be mapped within the subspace. Subsequently, robust optimization algorithms based on biological inspiration will be proposed. The optimization algorithms will be included with adaptiveness and so the nonlinear cost function can be solved effectively. The adaptiveness will be mainly based on the ability of handing every network constraints such as echelon levels, inventory, etc
2. efforts in traditional structural biology and the structural genomics projects that aim at high-
throughput complex structure determination [1], the latest statistics from 3did database [2]
show that only 7% of the known protein interactions in humans have an associated experimen-
tal complex structure. Thus, there is great need for computational methods that predict new
interactions and produce high-resolution structural modeling of PPIs. To evaluate the perfor-
mance of computational methods the quality of the PPI models produced by these methods
need to be assessed by comparing their structural similarity to the experimentally solved native
structures (targets). In contrast to the protein structure prediction field, where there are several
widely accepted quality measures; e.g., Cα
-RMSD, GDT_TS [3], MaxSub [4], TM-score [5],
and S-score [6], the IS-score [7] for assessing protein complex models has not achieved wide
adoption by the field and the current state of the art evaluation protocol for assessing the qual-
ity of docking models is still based on three distinct though related measures, namely Fnat,
LRMS and iRMS as proposed and standardized by the Critical Assessment of PRedicted Inter-
actions (CAPRI) community [8]. To calculate these measures, the interface between the two
interacting protein molecules (receptor and ligand) is defined as any pair of heavy atoms from
the two molecules within 5Å of each other. Fnat is then defined as the fraction of native interfa-
cial contacts preserved in the interface of the predicted complex. LRMS is the Ligand Root
Mean Square deviation calculated for the backbone of the shorter chain (ligand) of the model
after superposition of the longer chain (receptor) [9]. For the third measure, iRMS, the recep-
tor-ligand interface in the target (native) is redefined at a relatively relaxed atomic contact cut-
off of 10Å which is twice the value used to define inter-residue 'interface' contacts in case of
Fnat. The backbone atoms of these 'interface' residues is then superposed on their equivalents in
the predicted complex (model) to compute the iRMS [9]. For details with pictorial description
of all quality measures, see the original reference [10]. The CAPRI evaluation use different cut-
offs on these three measures to assign predicted docking models into the four quality classes:
Incorrect (Fnat < 0.1 or (LRMS > 10 and iRMS > 4.0)), Acceptable ((Fnat ! 0.1 and Fnat < 0.3)
and (LRMS 10.0 or iRMS 4.0) or (Fnat ! 0.3 and LRMS > 5.0 and iRMS > 2.0)), Medium
((Fnat ! 0.3 and Fnat < 0.5) and (LRMS 5.0 or iRMS 2.0) or (Fnat ! 0.5 and LRMS > 1.0
and iRMS > 1.0)), or High (Fnat ! 0.5 and (LRMS 1.0 or iRMS 1.0)) [10]. While this clas-
sification has been useful for the purpose of CAPRI, it is not as detailed as the quality measures
used in the protein structure prediction field, e.g. TM-score and GDT_TS. It is for instance,
difficult to directly correlate the CAPRI classification with any scoring function trying to esti-
mate the accuracy of docking models. Thus, the scoring part of CAPRI [8] and benchmarks of
scoring functions for docking [11,10,12,13] only focuses on the ability to select good models
according to the CAPRI classification, completely ignoring the potential useful information in
the lower ranked models in assessing the ability to estimate the true model quality. Thus, there
is a need to design a single robust continuous quality estimate covering all different structural
attributes captured individually by the CAPRI measures. In this study we derive such a contin-
uous quality measure, DockQ, for docking models that instead of classifying into different
quality groups, combines Fnat, LRMS, and iRMS to yield a score in the range [0, 1], correspond-
ing to low and high quality, respectively. This new measure can essentially be used to recapitu-
late the original CAPRI classification, and be used for more detailed analyses of similarity and
prediction performance.
Furthermore, the recent growth in using machine learning methods to score models would
not have been possible if there would not have been a development of single quality measures,
like TM-score, GDT_TS and S-score to serve as target functions. These methods have been suc-
cessful in CASP for predicting the quality of protein structure models [14,15] and there is no
reason to believe that they will not be as successful in predicting the quality of docking models.
Although the individual CAPRI measures (Fnat, LRMS, iRMS) as well as the classification into
DockQ Protein-Protein Model Quality
PLOS ONE | DOI:10.1371/journal.pone.0161879 August 25, 2016 2 / 9
3. incorrect, acceptable, medium and high quality models could potentially be used as target func-
tions in regression or classification schemes, it is natural and more convenient to use the com-
bined single measure, DockQ, which covers all the different quality attributes, captured by the
individual CAPRI measures. The potential use of DockQ as a target function in the design of a
docking scoring function by training support vector regression machines to predict quality of
docking models has already been demonstrated in a separate study [16].
Materials and Methods
Training set
A set from a recent benchmark of docking scoring function [13], (the MOAL-set), was used to
design and optimize DockQ. This set contained 56,015 docking models for 118 targets from
the protein-protein docking Benchmark 4.0 [17], constructed using SwarmDock [18] gra-
ciously provided by the authors of Moal et al [13]. The set contained 54,324 incorrect, 762
acceptable, 855 medium, and 74 high quality models.
Testing set
For independent testing, a subset based on the CAPRI Score_set [19] (http://cb.iri.univ-lille1.
fr/Users/lensink/Score_set/) containing models submitted to CAPRI between 2005–2014 with
their respective CAPRI quality measure (Fnat, LRMS, iRMS) was assembled. For simplicity, two
targets with multiple correct chain packings, i.e. same sequence binding at two different loca-
tions, were removed (Target37: 2W83, Target 40: 3E8L). The final CAPRI-set contained 13,849
incorrect, 632 acceptable, 565 medium, and 282 high quality models, in total 15,328.
Performance Measures
Matthews Correlation Coefficient (MCC) is defined by
MCC ¼ ðTP  TN À FP  FNÞ=½ðTP þ FPÞðTP þ FNÞðTN þ FPÞðTN þ FNÞŠ
1=2
where TP, FP, TN and FN refer to True Positives, False Positives, True Negatives and False
Negatives respectively. MCC is defined in the range of -1 (perfect anti-correlation) to 1 (perfect
correlation).
Precision (PPV) is the ratio of the true positives predicted at a given cutoff and the total
number of test outcome positives (including both true and false positives) determined at the
same cutoff. Thus, PPV = TP/(TP+FP).
Recall (TPR) is the number of true positives predicted at a given cutoff divided by the total
number of positives (P = TP + FN) in the set. Thus, TPR = TP/P.
F1-score is the harmonic mean between PPV and TPR and could be interpreted as a trade-
off between PPV and TPR and is defined by the following equation: F1 = 2PPV× TPR/(PPV
+TPR).
Transforming LRMS and iRMS
To avoid the problem of arbitrarily large RMS values that are essentially equally bad, RMS val-
ues were scaled using the inverse square scaling technique adapted from the S-score formula
[6]
RMSscaledðRMS; diÞ ¼
1
1 þ RMS
di
2 ð1Þ
DockQ Protein-Protein Model Quality
PLOS ONE | DOI:10.1371/journal.pone.0161879 August 25, 2016 3 / 9
4. where RMSscaled represents the scaled RMS deviations corresponding to any of the two terms,
LRMS or iRMS (RMS) and di is a scaling factor, d1 for LRMS and d2 for iRMS, optimized to
d1 = 8.5Å and d2 = 1.5Å (see Results and Discussion).
The hallmark of inverse-square scaling is the asymptotic smooth declination of the scaled
function (Y) with gradual increase of the raw score (X) (Figure A in S1 File). While, conversely,
the relative increment (dY/dX) of the scaled-to-the-raw-values increases at the lower end of X,
represented by a significantly steeper slope on the higher end of Y (say, Y0.5). The scaling
technique thus makes the scaled RMSD functions considerably more sensitive in discriminat-
ing between 'good' models (e.g., acceptable vs. medium; or, medium vs. high) varying slightly
in their relative quality. While, on the other hand, the function is close to zero for all kinds of
'bad' (incorrect) models regardless of their relative quality.
Results and Discussion
The aim of this study was to derive a continuous quality measure that can be used to rank
docking models and compare performances of methods scoring docking models in a direct
way. To make it simple and promote wide-acceptance, we chose to base the scoring function,
named DockQ, on the already established quality measures for docking Fnat, LRMS, and iRMS
used in CAPRI [8] and other benchmarks [13]. In the DockQ score we combined Fnat, LRMS,
and iRMS into one score by the mean of Fnat, and the two RMS values scaled according to Eq 1.
DockQðFnat; LRMS; iRMS; d1; d2Þ ¼ ðFnat þ RMSscaledðLRMS; d1Þ þ RMSscaledðiRMS; d2ÞÞ=3 ð2Þ
where RMSscaled(RMS,d) is defined in Eq 1, d1 and d2 are scaling parameters that determines
how fast large RMS values should be scaled to zero, and needs to be set based on the score
range for LRMS and iRMS. The advantages of the non-linear scaling of the RMS values is that
the function (Eq 2) only contains terms between 0 and 1, and that all have the same depen-
dence on quality, the higher the better. Perhaps even more important is that RMS values that
should be considered equally bad e.g. iRMS of 7Å or 14Å both get essentially the same low
RMSscaled score.
Optimizing d1 and d2
The two parameters in the DockQ score, d1 and d2, were optimized in a grid search on the
MOAL-set by calculating Eq 2 for all pairs of d1 and d2 in the range 0.5 to 10Å for d1, and 0.5 to
5Å for d2 in steps of 0.5. For each (d1,d2) pair the ability to separate the models according the
CAPRI classification was assessed by first defining the three cutoffs, C1, C2, and C3, that opti-
mized the Matthew's correlation coefficient (MCC) between, Incorrect and Acceptable (C1),
Acceptable and Medium (C2), and Medium and High (C3), respectively. The optimized cutoffs
were used to calculate an F1-score for the classification performance for each of the four differ-
ent classes. Finally, the average F1-score was used to measure the overall classification perfor-
mance and to decide on d1 and d2. The maximum average F1-score (0.91) was obtained
for d1 = 8.5Å and d2 = 1.5Å (Figure B in S1 File), corresponding to the cutoffs C1 = 0.23,
C2 = 0.49, and C3 = 0.80 (Fig 1A).
Independent Benchmark
The CAPRI-set was used as an independent benchmark to assess DockQ performance and
compare it to IS-score, which is similar in its design to TM-score for protein structure predic-
tion, but for interfaces. The cutoffs optimized on MOAL-set are close to optimal also for the
CAPRI-set within ±0.02 (Fig 1B), showing that cutoffs optimized on the MOAL-set can also be
used on the CAPRI-set. However, the main purpose of giving the cutoffs is to show the general
DockQ Protein-Protein Model Quality
PLOS ONE | DOI:10.1371/journal.pone.0161879 August 25, 2016 4 / 9
5. correspondence between DockQ and CAPRI classification not to use the cutoffs for classifica-
tion. Even though DockQ and IS-score has an overall Pearson's correlation of 0.98 (Fig 2), the
ability to reproduce the CAPRI classification is much better for DockQ. As illustrated in Fig 2,
the separation between different quality classes is much better according to DockQ, while the
separation based on IS-score is much more overlapping, e.g. an IS-score of 0.5 actually have
models in three different classes, Acceptable, Medium and High. Precision (PPV) vs. recall
(TPR) curves were then constructed to compare in greater detail, the ability of the two methods
to classify the models with respect to their original CAPRI classification: Acceptable or better,
Medium or better and High, by varying the cutoffs in the whole range [0, 1] of DockQ and IS-
score (Fig 3). The area under the curves (AUC) for DockQ (0.98, 0.99, 0.97) show almost per-
fect agreement with respect to the original CAPRI classification. This is true across all quality
classes; e.g. the PPV for DockQ at a recall of 90% is 95%, 97%, and 91% for Acceptable,
Medium and High respectively, while the PPV for IS-score is 71%, 72%, and 66% at the same
recall. It is no surprise that agreement with CAPRI classification is exceptionally good for
DockQ since it is using the CAPRI measures to derive the score. In fact, the average DockQ for
false predictions are within ±0.02 of the cutoff for a particular class, which means that most
false predictions are borderline cases. This is of course a consequence of classifying models in
different quality bins, for instance taking the original CAPRI classification as golden standard,
the average iRMS for the models with Medium quality classified incorrectly as High by DockQ
is 1.05Å, and High quality classified incorrectly as Medium is 0.93Å, while the cutoff in iRMS
between medium and high quality is 1.0Å (Medium 1.0Å; High ! 1.0Å) according to
CAPRI [10]. In any classification scheme there will be borderline cases, where virtually identi-
cal models are classified differently. Highlighting, yet again, the importance of using continu-
ous measures like DockQ or IS-score, which do not exhibit the same problems.
Software feature to deal with interacting multi-chains
Assessing dimer quality is a current challenge in CASP and CAPRI. In view with this, the
DockQ software has been built with the functionality to deal with interacting multi-chains.
Monomer-dimer or dimer-dimer interfaces are common in, for example, antigen-antibody
Fig 1. Distribution of DockQ score for Incorrect, Acceptable, Medium, and High quality models, respectively for (A) MOAL-set, and (B) CAPRI-
set. The colored bars corresponding to each CAPRI class represents frequency distribution of models predicted to be falling in a particular class
normalized by the total number of models in that class.
doi:10.1371/journal.pone.0161879.g001
DockQ Protein-Protein Model Quality
PLOS ONE | DOI:10.1371/journal.pone.0161879 August 25, 2016 5 / 9
6. Fig 2. Scatter plot IS-score vs. DockQ on the CAPRI-set. Models are colored according to CAPRI classification as Incorrect (blue), Acceptable (cyan),
Medium (red), High (green). The overall correlation is R = 0.98, while the correlation within the different quality classes is 0.77, 0.82, 0.90, and 0.65,
respectively.
doi:10.1371/journal.pone.0161879.g002
Fig 3. Precision (PPV) vs. Recall plots for the ability of DockQ and IS-score to separate models with (A) Acceptable or better, (B) Medium or
better, and (C) High quality, respectively, on the CAPRI-set. The area under the curves (AUC) for DockQ and IS-score are (0.98, 0.99, 0.97) and
(0.89, 0.92, 0.82) respectively for (A) Acceptable, (B) Medium and (C) High.
doi:10.1371/journal.pone.0161879.g003
DockQ Protein-Protein Model Quality
PLOS ONE | DOI:10.1371/journal.pone.0161879 August 25, 2016 6 / 9
7. interactions, due to the internal symmetry in the biological assembly of the heavy and light var-
iable chains of the immunoglobulin, where the partner-antigen can potentially bind asymmet-
rically at the antigen binding sites [20]. This is also common amongst molecular recognition
involved in Major Histocompatibility Complexes in antigen presenting cells [21], nuclear
transport and other signal transduction pathways [22]. Multimeric biological assemblies of
higher order than that of dimers are also found to occur, particularly common in viral enve-
lopes / capsids [23], viral glycoproteins [24,25] and cytoplasmic subunits of voltage-gated
channels [26]. To this end, the software has been built with the functionality to handle all dif-
ferent possible combinations of chains specified in the two inputs (native, model) with appro-
priate command-line options without the need to merge the chains manually before. It also has
the option to tryout different chain order combinations to find the best matching DockQ score
if there are multiple symmetric correct solutions.
Conclusions
DockQ is a continuous protein-protein docking model quality score, performing as good as the
three original CAPRI measures (Fnat, LRMS, iRMS) in segregating the models in the four differ-
ent CAPRI quality classes. If the CAPRI measures are already calculated it is simple to calculate
DockQ using Eq 2 with d1 = 8.5 and d2 = 1.5. Since DockQ essentially recapitulates the CAPRI
classification almost perfectly, it can be viewed as a higher resolution version of the CAPRI
classification. The fact that it is continuous makes it possible to estimate model quality in a
more quantitative way using Z-scores or sum of top ranked models, which has been so valuable
for the CASP community. It should also be very useful for comparing the performance of
energy functions used for ranking and scoring docking models in more detail, by analyzing
complete rankings (not only top ranked), correlations, and DockQ vs. energy scatter plots. In
addition, DockQ can be used as a target function in developing new knowledge-based scoring
functions using for instance machine learning, a feature that has been investigated in a separate
study [16]. To simplify the calculation of DockQ we provide a stand-alone program that given
the atomic coordinates of a docking model and the native structure calculates all CAPRI mea-
sures and the DockQ score.
Supporting Information
S1 File. Supporting Figures. Figure A. Demonstration of the Inverse Square Scaling tech-
nique. The scaling parameter, k, describes the raw score (X) at half maximal high (0.5) of the
scaled score, Y. k is set to 8.5 in this example illustration which is the optimized value for
LRMS. Figure B. Heat map with average F1-values for the optimization of d1 and d2 on the
MOAL-set. Each value is smoothed by taking an average over its nearest neighbors to remove
the effect of outliers.
(PDF)
Author Contributions
Conceptualization: BW.
Funding acquisition: BW.
Investigation: SB BW.
Methodology: BW.
Resources: BW.
DockQ Protein-Protein Model Quality
PLOS ONE | DOI:10.1371/journal.pone.0161879 August 25, 2016 7 / 9
8. Software: SB BW.
Supervision: BW.
Writing – original draft: SB BW.
Writing – review editing: SB BW.
References
1. The Protein Structure Initiative: achievements and visions for the future [Internet]. [cited 15 Jun 2016].
Available: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3318194/
2. 3did: a catalog of domain-based interactions of known three-dimensional structure [Internet]. [cited 15
Jun 2016]. Available: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3965002/
3. Zemla A, Venclovas C, Moult J, Fidelis K. Processing and analysis of CASP3 protein structure predic-
tions. Proteins. 1999;Suppl 3: : 22–29. PMID: 10526349
4. Siew N, Elofsson A, Rychlewski L, Fischer D. MaxSub: an automated measure for the assessment of
protein structure prediction quality. Bioinformatics. 2000; 16: 776–785. PMID: 11108700
5. Zhang Y, Skolnick J. Scoring function for automated assessment of protein structure template quality.
Proteins. 2004; 57: 702–710. doi: 10.1002/prot.20264 PMID: 15476259
6. Cristobal S, Zemla A, Fischer D, Rychlewski L, Elofsson A. A study of quality measures for protein
threading models. BMC Bioinformatics. 2001; 2: 5. PMID: 11545673
7. Gao M, Skolnick J. New benchmark metrics for protein-protein docking methods. Proteins. 2011; 79:
1623–1634. doi: 10.1002/prot.22987 PMID: 21365685
8. Lensink MF, Wodak SJ. Docking, scoring, and affinity prediction in CAPRI. Proteins. 2013; 81: 2082–
2095. doi: 10.1002/prot.24428 PMID: 24115211
9. Méndez R, Leplae R, De Maria L, Wodak SJ. Assessment of blind predictions of protein–protein inter-
actions: Current status of docking methods. Proteins Struct Funct Bioinforma. 2003; 52: 51–67. doi: 10.
1002/prot.10393
10. Lensink MF, Méndez R, Wodak SJ. Docking and scoring protein complexes: CAPRI 3rd Edition. Pro-
teins. 2007; 69: 704–718. doi: 10.1002/prot.21804 PMID: 17918726
11. Chen R, Li L, Weng Z. ZDOCK: an initial-stage protein-docking algorithm. Proteins. 2003;52.
12. Bernauer J, Aze J, Janin J, Poupon A. A new protein-protein docking scoring function based on inter-
face residue properties. Bioinformatics. 2007; 23.
13. Moal IH, Torchala M, Bates PA, Fernández-Recio J. The scoring of poses in protein-protein docking:
current capabilities and future directions. BMC Bioinformatics. 2013; 14: 286. doi: 10.1186/1471-2105-
14-286 PMID: 24079540
14. Ray A, Lindahl E, Wallner B. Improved model quality assessment using ProQ2. BMC Bioinformatics.
2012; 13: 224. doi: 10.1186/1471-2105-13-224 PMID: 22963006
15. Cao R, Bhattacharya D, Adhikari B, Li J, Cheng J. Large-scale model quality assessment for improving
protein tertiary structure prediction. Bioinformatics. 2015; 31: i116–i123. doi: 10.1093/bioinformatics/
btv235 PMID: 26072473
16. Basu S, Wallner B. Finding correct protein–protein docking models using ProQDock. Bioinformatics.
2016; 32: i262–i270. doi: 10.1093/bioinformatics/btw257 PMID: 27307625
17. Hwang H, Vreven T, Pierce B, Hung J-H, Weng Z. Performance of ZDOCK and ZRANK in CAPRI
Rounds 13–19. Proteins. 2010; 78: 3104–3110. doi: 10.1002/prot.22764 PMID: 20936681
18. Torchala M, Moal IH, Chaleil RAG, Fernandez-Recio J, Bates PA. SwarmDock: a server for flexible pro-
tein–protein docking. Bioinformatics. 2013; 29: 807–809. doi: 10.1093/bioinformatics/btt038 PMID:
23343604
19. Lensink MF, Wodak SJ. Score_set: a CAPRI benchmark for scoring protein complexes. Proteins.
2014; 82: 3163–3169. doi: 10.1002/prot.24678 PMID: 25179222
20. Soto C, Ofek G, Joyce MG, Zhang B, McKee K, Longo NS, et al. Developmental Pathway of the MPER-
Directed HIV-1-Neutralizing Antibody 10E8. PloS One. 2016; 11: e0157409. doi: 10.1371/journal.pone.
0157409 PMID: 27299673
21. Zeng L, Sullivan LC, Vivian JP, Walpole NG, Harpur CM, Rossjohn J, et al. A structural basis for antigen
presentation by the MHC class Ib molecule, Qa-1b. J Immunol Baltim Md 1950. 2012; 188: 302–310.
doi: 10.4049/jimmunol.1102379
DockQ Protein-Protein Model Quality
PLOS ONE | DOI:10.1371/journal.pone.0161879 August 25, 2016 8 / 9
9. 22. Stewart M, Kent HM, McCoy AJ. Structural basis for molecular recognition between nuclear transport
factor 2 (NTF2) and the GDP-bound form of the Ras-family GTPase Ran. J Mol Biol. 1998; 277: 635–
646. doi: 10.1006/jmbi.1997.1602 PMID: 9533885
23. Kong L, He L, de Val N, Vora N, Morris CD, Azadnia P, et al. Uncleaved prefusion-optimized gp140 tri-
mers derived from analysis of HIV-1 envelope metastability. Nat Commun. 2016; 7: 12040. doi: 10.
1038/ncomms12040 PMID: 27349805
24. Halldorsson S, Behrens A-J, Harlos K, Huiskonen JT, Elliott RM, Crispin M, et al. Structure of a phlebo-
viral envelope glycoprotein reveals a consolidated model of membrane fusion. Proc Natl Acad Sci U S
A. 2016; 113: 7154–7159. doi: 10.1073/pnas.1603827113 PMID: 27325770
25. Zhao Y, Ren J, Harlos K, Jones DM, Zeltina A, Bowden TA, et al. Toremifene interacts with and destabi-
lizes the Ebola virus glycoprotein. Nature. 2016; 535: 169–172. doi: 10.1038/nature18615 PMID:
27362232
26. Gulbis JM, Zhou M, Mann S, MacKinnon R. Structure of the cytoplasmic beta subunit-T1 assembly of
voltage-dependent K+ channels. Science. 2000; 289: 123–127. PMID: 10884227
DockQ Protein-Protein Model Quality
PLOS ONE | DOI:10.1371/journal.pone.0161879 August 25, 2016 9 / 9