This document describes research using an ensemble of heterogeneous flexible neural trees (FNTs) to predict the dissolution profiles of poly(lactic-co-glycolic acid) (PLGA) micro- and nanoparticles. The researchers trained multiple FNT regression models on different feature subsets and parameter settings. They then combined the models using an ensemble approach to improve predictive accuracy. Their best model achieved a root mean square error of 11.541 on test data, an improvement over other methods. Feature selection identified the most influential factors on PLGA dissolution. The ensemble of diverse FNT models and feature selection led to more accurate PLGA dissolution profile predictions.
An introduction to variable and feature selectionMarco Meoni
Presentation of a great paper from Isabelle Guyon (Clopinet) and André Elisseeff (Max Planck Institute) back in 2003, which outlines the main techniques for feature selection and model validation in machine learning systems
Multi-Domain Diversity Preservation to Mitigate Particle Stagnation and Enab...Weiyang Tong
This paper makes important advancements to a Particle Swarm Optimization (PSO) algorithm that seeks to address the major complex attributes of engineering optimization problems, namely multiple objectives, high nonlinearity, high dimensionality, constraints, and mixed-discrete variables. To introduce these capabilities while keeping PSO competitive with other powerful multi-objective algorithms (e.g., NSGA-II, SPEA, and PAES), it is important to not only preserve population diversity (for mitigating stagnation), but also explicit diversity preservation to facilitate improved converge of (non-convex) Pareto frontiers. A new multi-domain preservation technique is presented in this paper for this purpose. In this technique, an adoptive repulsion is applied to each global leader to slow down the clustering of particles overly popular global leaders, and maintain a desirably even distribution of Pareto optimal solutions. In addition, the global leader selection is now modified to follow a stochastic solution based on a half Gaussian distribution. Specifically, two different population diversity measures are explored: (i) based on the smallest hypercube enclosing the entire population, and (ii) based on the smallest hypercube enclosing the subset of particles following each of the global leaders. Both strategies are investigated using a suite of benchmark problems. The performance of the new PSO algorithm is compared with other algorithms in terms of convergence measure, uniformity measure, and computation time.
How predictive models help Medicinal Chemists design better drugs_webinarAnn-Marie Roche
All scientific disciplines, including medicinal chemistry, are experiencing a revolution in unprecedented rates of data being generated and the subsequent analysis and exploitation of this data is increasingly fundamental to innovation. Using data to design better compounds is a challenge for Medicinal and Computational chemists.
The design of small-molecule drug candidates, encompassing characteristics such as potency, selectivity and ADMET (absorption, distribution, metabolism, excretion and toxicity) is a key factor in the success of clinical trials and computer-aided drug discovery/design methods have played a major role in the development of therapeutically important small molecules for over three decades. These methods are broadly classified as either structure-based or ligand-based.
In this webinar our expert Dr. Olivier Barberan will discuss ligand-based methods and he will cover the following:
How to use only ligand information to predict activity depending on its similarity/dissimilarity to previously known active ligands.
- Discuss ligand-based pharmacophores, molecular descriptors, and quantitative structure-activity relationships and important tools such as target/ligand databases necessary for successful implementation of various computer-aided drug discovery/design methods in a drug discovery campaign.
Metabolomic Data Analysis Workshop and Tutorials (2014)Dmitry Grapov
Get more information:
http://imdevsoftware.wordpress.com/2014/10/11/2014-metabolomic-data-analysis-and-visualization-workshop-and-tutorials/
Recently I had the pleasure of teaching statistical and multivariate data analysis and visualization at the annual Summer Sessions in Metabolomics 2014, organized by the NIH West Coast Metabolomics Center.
Similar to last year, I’ve posted all the content (lectures, labs and software) for any one to follow along with at their own pace. I also plan to release videos for all the lectures and labs.
An introduction to variable and feature selectionMarco Meoni
Presentation of a great paper from Isabelle Guyon (Clopinet) and André Elisseeff (Max Planck Institute) back in 2003, which outlines the main techniques for feature selection and model validation in machine learning systems
Multi-Domain Diversity Preservation to Mitigate Particle Stagnation and Enab...Weiyang Tong
This paper makes important advancements to a Particle Swarm Optimization (PSO) algorithm that seeks to address the major complex attributes of engineering optimization problems, namely multiple objectives, high nonlinearity, high dimensionality, constraints, and mixed-discrete variables. To introduce these capabilities while keeping PSO competitive with other powerful multi-objective algorithms (e.g., NSGA-II, SPEA, and PAES), it is important to not only preserve population diversity (for mitigating stagnation), but also explicit diversity preservation to facilitate improved converge of (non-convex) Pareto frontiers. A new multi-domain preservation technique is presented in this paper for this purpose. In this technique, an adoptive repulsion is applied to each global leader to slow down the clustering of particles overly popular global leaders, and maintain a desirably even distribution of Pareto optimal solutions. In addition, the global leader selection is now modified to follow a stochastic solution based on a half Gaussian distribution. Specifically, two different population diversity measures are explored: (i) based on the smallest hypercube enclosing the entire population, and (ii) based on the smallest hypercube enclosing the subset of particles following each of the global leaders. Both strategies are investigated using a suite of benchmark problems. The performance of the new PSO algorithm is compared with other algorithms in terms of convergence measure, uniformity measure, and computation time.
How predictive models help Medicinal Chemists design better drugs_webinarAnn-Marie Roche
All scientific disciplines, including medicinal chemistry, are experiencing a revolution in unprecedented rates of data being generated and the subsequent analysis and exploitation of this data is increasingly fundamental to innovation. Using data to design better compounds is a challenge for Medicinal and Computational chemists.
The design of small-molecule drug candidates, encompassing characteristics such as potency, selectivity and ADMET (absorption, distribution, metabolism, excretion and toxicity) is a key factor in the success of clinical trials and computer-aided drug discovery/design methods have played a major role in the development of therapeutically important small molecules for over three decades. These methods are broadly classified as either structure-based or ligand-based.
In this webinar our expert Dr. Olivier Barberan will discuss ligand-based methods and he will cover the following:
How to use only ligand information to predict activity depending on its similarity/dissimilarity to previously known active ligands.
- Discuss ligand-based pharmacophores, molecular descriptors, and quantitative structure-activity relationships and important tools such as target/ligand databases necessary for successful implementation of various computer-aided drug discovery/design methods in a drug discovery campaign.
Metabolomic Data Analysis Workshop and Tutorials (2014)Dmitry Grapov
Get more information:
http://imdevsoftware.wordpress.com/2014/10/11/2014-metabolomic-data-analysis-and-visualization-workshop-and-tutorials/
Recently I had the pleasure of teaching statistical and multivariate data analysis and visualization at the annual Summer Sessions in Metabolomics 2014, organized by the NIH West Coast Metabolomics Center.
Similar to last year, I’ve posted all the content (lectures, labs and software) for any one to follow along with at their own pace. I also plan to release videos for all the lectures and labs.
A GENETIC-FROG LEAPING ALGORITHM FOR TEXT DOCUMENT CLUSTERINGLubna_Alhenaki
In this project, a new optimization methodology technique was invented. I used the Genetic algorithm for feature selection and the Shuffled Frog Leaping algorithm for text documents purpose.
A workshop is intended for those who are interested in and are in the planning stages of conducting an RNA-Seq experiment. Topics to be discussed will include:
* Experimental Design of RNA-Seq experiment
* Sample preparation, best practices
* High throughput sequencing basics and choices
* Cost estimation
* Differential Gene Expression Analysis
* Data cleanup and quality assurance
* Mapping your data
* Assigning reads to genes and counting
* Analysis of differentially expressed genes
* Downstream analysis/visualizations and tables
Performance Issue? Machine Learning to the rescue!Maarten Smeets
t can be difficult to determine how to improve performance of microservices. There are many factors you can vary but which factor will be the one having most impact? During this presentation, a method using the random forest machine learning algorithm will be applied in order to help improve performance of a microservice running inside a JVM. Several measures are taken such as thoughput and response times. Java version, JVM supplier, heap, garbage collection algorithm and microservice framework are all varied. Which factor is most important in determining the response time and throughput of the services? The Random Forest algorithm will be introduced to solve this challenge. Not only will this presentation give some useful suggestions for improving the performance of microservices but will also introduce a novel way to take on the challenge of performance tuning which can be applied to other use-cases. This presentation is especially interesting to developers and architects.
The ionization state of a chemical, reflected in pKa values, affects lipophilicity, solubility, protein binding and the ability of a chemical to cross the plasma membrane. These properties govern the pharmacokinetic parameters such as absorption, distribution, metabolism, excretion and toxicity and thus pKa is a fundamental chemical property and is used in many models of chemical toxicity.
Experimentally determining pKa is not feasible for high-throughput assays. Predicting pKa is challenging and existing models have been developed only using restricted chemical space (e.g., anilines, phenols, benzoic acids, primary amines) and lack of a generalized model impedes ADME modeling.
No free and open source models exist for heterogeneous chemical classes, however, several proprietary programs exist. In this work, pKa open data bundled with DataWarrior (http://www.openmolecules.org/) were used to develop predictive models for pKa. After data cleaning, there were ~3100 and ~3900 monoprotic chemicals with an acidic or basic pKa, respectively. 1D and 2D chemical descriptors (AlogP, Topological polar surface area, etc) in addition to 12 fingerprints (presence or absence of a chemical group) were generated using PaDEL software. Three datasets were used: acidic, basic and acidic and basic combined.
13 datasets were examined, the 1D/2D descriptors and 12 fingerprints. Using the Extreme Gradient Boosting algorithm showed that the MACCS and Substructure Count fingerprints yielded the best results, with models showing an R-Squared of ~0.78 and a RMSE of 1.42.
Recently, Deep Learning models have showed remarkable progress in image recognition and natural language processing. To determine if the Deep Learning algorithms would increase model performance we examined the datasets and found that the Deep Learning models were somewhat superior than Extreme Gradient Boosting with an R-Squared of ~0.80 and an RMSE of ~1.38.
This work does not reflect U.S. EPA policy.
Adam Weinglass and Mary Jo Wildey from Merck & Co. share their winning presentation from SLAS2017 in Washington, DC. Join the conversation in the SLAS Screen Design and Assay Technology Special Interest Group LinkedIn group at https://www.linkedin.com/groups/3867725.
Use of spark for proteomic scoring seattle presentationlordjoe
Slides presented to the Seattle Spark Meetup on August 12 2015 - Note the work on Accumulators is a separate GitHub project https://github.com/lordjoe/SparkAccumulators
This is the last presentation of the BITS training on 'Comparative genomics'.
It reviews tthe Contra tool for detecting common transcription factor binding sites in sequences.
Thanks to Stefan Broos of the DMBR department of VIB
Similar to Ensemble of Heterogeneous Flexible Neural Tree for the approximation and feature-selection of poly (lactic-co-glycolic acid) micro- and nanoparticle
A GENETIC-FROG LEAPING ALGORITHM FOR TEXT DOCUMENT CLUSTERINGLubna_Alhenaki
In this project, a new optimization methodology technique was invented. I used the Genetic algorithm for feature selection and the Shuffled Frog Leaping algorithm for text documents purpose.
A workshop is intended for those who are interested in and are in the planning stages of conducting an RNA-Seq experiment. Topics to be discussed will include:
* Experimental Design of RNA-Seq experiment
* Sample preparation, best practices
* High throughput sequencing basics and choices
* Cost estimation
* Differential Gene Expression Analysis
* Data cleanup and quality assurance
* Mapping your data
* Assigning reads to genes and counting
* Analysis of differentially expressed genes
* Downstream analysis/visualizations and tables
Performance Issue? Machine Learning to the rescue!Maarten Smeets
t can be difficult to determine how to improve performance of microservices. There are many factors you can vary but which factor will be the one having most impact? During this presentation, a method using the random forest machine learning algorithm will be applied in order to help improve performance of a microservice running inside a JVM. Several measures are taken such as thoughput and response times. Java version, JVM supplier, heap, garbage collection algorithm and microservice framework are all varied. Which factor is most important in determining the response time and throughput of the services? The Random Forest algorithm will be introduced to solve this challenge. Not only will this presentation give some useful suggestions for improving the performance of microservices but will also introduce a novel way to take on the challenge of performance tuning which can be applied to other use-cases. This presentation is especially interesting to developers and architects.
The ionization state of a chemical, reflected in pKa values, affects lipophilicity, solubility, protein binding and the ability of a chemical to cross the plasma membrane. These properties govern the pharmacokinetic parameters such as absorption, distribution, metabolism, excretion and toxicity and thus pKa is a fundamental chemical property and is used in many models of chemical toxicity.
Experimentally determining pKa is not feasible for high-throughput assays. Predicting pKa is challenging and existing models have been developed only using restricted chemical space (e.g., anilines, phenols, benzoic acids, primary amines) and lack of a generalized model impedes ADME modeling.
No free and open source models exist for heterogeneous chemical classes, however, several proprietary programs exist. In this work, pKa open data bundled with DataWarrior (http://www.openmolecules.org/) were used to develop predictive models for pKa. After data cleaning, there were ~3100 and ~3900 monoprotic chemicals with an acidic or basic pKa, respectively. 1D and 2D chemical descriptors (AlogP, Topological polar surface area, etc) in addition to 12 fingerprints (presence or absence of a chemical group) were generated using PaDEL software. Three datasets were used: acidic, basic and acidic and basic combined.
13 datasets were examined, the 1D/2D descriptors and 12 fingerprints. Using the Extreme Gradient Boosting algorithm showed that the MACCS and Substructure Count fingerprints yielded the best results, with models showing an R-Squared of ~0.78 and a RMSE of 1.42.
Recently, Deep Learning models have showed remarkable progress in image recognition and natural language processing. To determine if the Deep Learning algorithms would increase model performance we examined the datasets and found that the Deep Learning models were somewhat superior than Extreme Gradient Boosting with an R-Squared of ~0.80 and an RMSE of ~1.38.
This work does not reflect U.S. EPA policy.
Adam Weinglass and Mary Jo Wildey from Merck & Co. share their winning presentation from SLAS2017 in Washington, DC. Join the conversation in the SLAS Screen Design and Assay Technology Special Interest Group LinkedIn group at https://www.linkedin.com/groups/3867725.
Use of spark for proteomic scoring seattle presentationlordjoe
Slides presented to the Seattle Spark Meetup on August 12 2015 - Note the work on Accumulators is a separate GitHub project https://github.com/lordjoe/SparkAccumulators
This is the last presentation of the BITS training on 'Comparative genomics'.
It reviews tthe Contra tool for detecting common transcription factor binding sites in sequences.
Thanks to Stefan Broos of the DMBR department of VIB
It will help you in understanding about genetic algorithm and more
Similar to Ensemble of Heterogeneous Flexible Neural Tree for the approximation and feature-selection of poly (lactic-co-glycolic acid) micro- and nanoparticle (20)
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Adjusting primitives for graph : SHORT REPORT / NOTES
Ensemble of Heterogeneous Flexible Neural Tree for the approximation and feature-selection of poly (lactic-co-glycolic acid) micro- and nanoparticle
1. Ensemble of Heterogeneous Flexible Neural
Tree for the approximation and feature-
selection of poly (lactic-co-glycolic acid)
micro- and nanoparticle
Varun Kumar Ojha1, Ajith Abraham1, Vaclav Snasel1
1IT4Innovations, VŠB Technical University of Ostrava, Ostrava, Czech
Republic
{varun.kumar.ojha, ajith.abraham, vaclav.snasel}@vsb.cz
Second International Afro-European Conference for Industrial Advancement
September 9-11, 2015, Villejuif, France
2. Problem
• Problem: Prediction of the dissolution prole of Poly Lactic-co-Glycolic
Acid (PLGA) micro- and nanoparticles.
• Motivation: PLGA micro-particles are important diluents in the
• formulation of drugs in the dosage form.
• It act as an excipient in drug formation.
• It helps dissolution of the drugs, thus increases absorbability
• and solubility of drugs.
• It helps in pharmaceutical manufacturing process by improving
• APIs powder's flow ability and non-stickiness.
3. Approach
• Critical Issue: PLGA dissolution prediction is a complex problem as
there are several potential factors influencing dissolution of PLGA
protein particles. Collecting all such influencing factors leads to three
hundred input features in dataset.
• Background: Szlek et al.1 in their article offered a dataset with three
hundred input features divided into four groups, namely protein
descriptor, plasticizer, formulation characteristics, and emulsifier
collected from various literature.
• Goal: Dimensionality reduction using feature selection and finding a
suitable prediction model.
1Szlek, J., Paclawski, A., Lau, R., Jachowicz, R., Mendyk, A.: Heuristic modeling of macromolecule release from PLGA microspheres.
International journal of nanomedicine 8 (2013) 4601.
4. The PLGA dataset description
# Group name # features Importance
1 Protein descriptors 85 Describes the type of molecules and proteins used
2 Formulation characteristics 17 Describe the molecular properties such as molecular
weight, particle size, etc.
3 Plasticizer 98 Describe the properties such as fluidity of the material
used
4 Emulsifier 99 Describe the properties of stabilizing/increase the
pharmaceutical product life
5 Time in days 1 Time taken to dissolve
6 % of molecules dissolved 1 PLGA micro-nanoparticle dissolution-rate
5. Methodology
[Feature Selection and Regression Model Training]
(Training of regression models in order to discover input output relationship)
↓
[Ensemble of regression models]
↓
(To exploit the goodness of all trained regression models instead of relaying on single best)
↓
[Selected features - Regression models/Ensemble model]
6. Flexible Neural Tree (FNT)
𝑅𝑀𝑆𝐸 =
1
𝑁 𝑖=1
𝑁
𝑦𝑖 − 𝑦𝑖
2 ,
Objective: For a dataset with 𝑛 many independent
variables 𝑋 and a dependent variable 𝑌, an
approximation model tries finds relationship between
them. Moreover, it tries to find unknown parameter
𝜃 such that root mean square error (RMSE) between
models’ output 𝑌 and actual output 𝑌 be zero. we
may write RMSE as”
where 𝑁 is number of examples.
7. Flexible Neural Tree
• Analogy with Neural Network
• Function Node: Resembles the Active Nodes.
• Leaf Node: Indicates the Input Nodes
• Edge: Indicates the Synaptic Weights
• Root Node: Indicates Output Node.
• Structure Optimization: Finding an optimal or near-optimal neural
tree is formulated as a product of evolution. For that purpose a
Genetic Programming may be used.
• Parameter Optimization: Particle Swarm Optimization (PSO), Artificial
Bee Colony etc. may be used for the parameter optimization.
• Input Feature Selection: Leaf Nodes represent input features that may
be selected randomly
7
8. Metaheuristics
• To find a solution to a problem using certain rules or
mechanism that may be inspired by the nature.
• The operators of metaheuristics
• Transition: Searching for the solutions (exploration and
exploitation).
• Evaluation: Evaluating the objective function.
• Determination: Deciding the search directions.
• Verifying Goal: Convergence
8
9. Evolutionary Algorithms
• Evolutionary Algorithms
• Genetic population based meta-heuristic algorithm that finds optimal solution using
the dynamics of evolutionary process. Basically uses genetic operates such as
• Selection
• Cross-over
• Mutation.
• Genetic Programming(GP)
• Introduced by John Koza, 1992
• The basic concept of GP is to evolve a program instead of bit-string
• i.e. the Genetic operators are directly applied on the Phenotype rather than
on the Genotype.
• It search for an optimum tree structure (Phenotype) in a program space.
9
11. Mutation Operator
• Mutation at a single leaf
node.
• Mutation at all leaf nodes
• Mutation by punning a sub-
tree and replace by
randomly generated-Sub-
tree
• Mutation by growing a
tree/appending a randomly
generated sub-tree
11
3
*
*
12. Metaheuristics for Parameter optimization
• Deferential Evaluation (Storn and Price, 1995) Evolutionary Algorithm
based optimization algorithm [Operators – Selection and Crossover ].
• Swarm Based Metaheuristics
• Particle Swarm Optimization (Eberhart and Kennedy, 1995) is a
population based meta-heuristic algorithm imitates the mechanisms
of the foraging behavior of swarms. Depends of velocity and position
update of the particles in a swarm.
• Artificial Bee Colony (Karaboga, 2005) is a meta-heuristic algorithm
inspired by foraging behavior of honey bee swarm. Depends of food
position that is updated by the artificial bees in an iterative fashion.
12
13. Ensemble
• A collective decision with consensus of many member is better
than the decision of an Individual
• Two components of Ensemble
• Construction of diverse and accurate models
• Training models with different sets of data (Bagging)
• Training models with different set of input features (Random Sub-space)
• Training models with different set of parameters
• Combining the models using a combination rules
• Non-trainable
• Trainable
13
14. Ensemble of FNTS
• Making Use of Final Population
• Diversity:
• Models in the final population can have different input features.
• Models in the final population can have different structure.
• Models in the final population can have different active nodes.
• Combination of FNTs
• Regression Problem: Mean of Output, Weighted Mean (Rank based or Trainable
based)
• Classification Problem: Majority Voting, Weighed Majority Voting (Rank based or
Trainable based)
14
15. Experiment Design and Parameter Set-Up
# Parameter Name Parameter Utility Values
1 Tree Height Maximum number of level of FNT 5
2 Tree Arity Maximum number of siblings a node. 10
3 Tree Node Type Indicates the type of activation at nodes. Gaussian
4 GP Population Number of candidates in genetic population. 30
5 Mutation probability The probability that a candidate will be mutated in a genetic
programing. 0.4
6 Crossover probability The probability that candidates will take part in crossover
operation. 0.5
7 elitism The probability that a candidate will propagate to next
generation as it is. 0.1
8 Tournament Size It indicate the size of the pool used for the selection of the
candidates. 15
9 MH Algorithm Population The initial size of the swarm (population). 50
10 MH Algorithm Node Range It defines the search space of the transfer-function arguments. [0,1]
11 MH Algorithm Edge Range It defines the search space for the edges (weights) of tree. [-1.0,1.0]
13 Maximum Structure Iteration Maximum number of generation of genetic programing 100000
14 Maximum Parameter Iteration Maximum number of evaluation of parameter optimization 10000
17. Comparison
# RMSE Features Model Literature
1 12.885 15 FNT Current Work
2 13.34 15 REP Tree [24] Ojha et al. [12]
3 14.3 17 MLP [16] Szlęk et al. [3]
4 14.88 15 GP Regression [23] Ojha et al. [12]
5 15.2 15 MLP [16] Ojha et al. [12]
6 15.4 11 MLP [16] Szlęk et al. [3]
18. Feature Selection
# Feature Index Abbreviation Probability
1 Time Days 299 TD 0.94
2 Prod method 100 PM 0.83
3 PVA conc. inner phase 88 PVA 0.78
4 Ring atom count 110 RAC 0.61
5 Heteroaliphatic ring count 23 HIRC 0.50
6 Aliphatic bond count 104 ABC 0.44
7 Diss. add 98 DA 0.44
8 pH 11 msdon 195 PH11MD 0.39
9 pH 12 msacc 181 PH12MC 0.39
10 Ring count 23 RC 0.39
11 a(yy) 119 AYY 0.28
12 Chain bond count 213 CBC 0.28
13 Diss. add conc. 99 DAC 0.28
14 Fragment count 133 FC 0.28
15 Aromatic ring count 24 ARC 0.22
19. Conclusion
• The aim of PLGA dissolution-rate prediction experiment was to find
the significant variables that governs the prediction rate and to create
a model for realizing the PLGA prediction profile.
• Our current experiment provides an insight of the PLGA dissolution
rate prediction. We have discovered a list of most significant features
by computing their probability of selection using our model (higher
the probability of selection, higher the significance on prediction).
• We achieved high accuracy using FNT model.
• Ensemble of distinct predictors (models) helped in improving
achieved accuracy.