Dimensionality Reduction and Prediction of the Protein Macromolecule Dissolution Prole

Introduction
Methodology
Experimental Results
Conclusions
Dimensionality Reduction and Prediction of the
Protein Macromolecule Dissolution Proﬁle
V. K. Ojha, K. Jackowski, V. Sn´aˇsel and A. Abraham
IT4Innovations
VˇSB - Technical University of Ostrava
Czech Republic
24 June 2014
1 / 14 Varun Ojha IBICA 2014

Introduction
Methodology
Conclusions
The problem
Approach
A Complete overview
Introduction
Problem: Prediction of the dissolution proﬁle of Poly
(Lactic-co-Glycolic Acid) (PLGA) micro- and nanoparticles.
Motivation: PLGA microparticles are important diluents in the
formulation of drugs in the dosage form.
It act as an excipient in drug formation.
It helps dissolution of the drugs, thus increases absorbability
and solubility of drugs.
It helps in pharmaceutical manufacturing process by improving
APIs powder’s ﬂowability and nonstickiness.

Introduction
Methodology
Conclusions
The problem
Approach
A Complete overview
Introduction
Critical Issue: PLGA dissolution prediction is a complex
problem as there are several potential factors influencing
dissolution of PLGA protein particles. Collecting all such
influencing factors leads to three hundred input features in
dataset.
Background: Szlkeket et al. 1 in their article offered a dataset
with three hundred input features divided into four groups,
namely protein descriptor, plasticizer, formulation
characteristics, and emulsifier collected from various literature.
Goal: Dimensionality reduction using feature
selection/extraction and finding a suitable regression model.
1
Szlkek, J., Paclawski, A., Lau, R., Jachowicz, R., Mendyk, A.: Heuristic modeling of macromolecule release
from PLGA microspheres. International journal of nanomedicine 8 (2013) 4601.

Introduction
Methodology
Conclusions
The problem
Approach
A Complete overview
Overview
Dataset
Dimension,Reduction
Feature,Selection Feature,Extraction
Linear Nonlinear
PCA FA ICA kPCA MDS
Prediction,Models,:GPReg,,LReg,,MLP,,SMORegT
Results,of,10,Cross-validation,Sets,
Select:,Dimension,Reduction,Technique Select:,Prediction,model
Figure: A complete overview of the experimental setup

Introduction
Methodology
Conclusions
Dimensionality Reduction
Regression Models
Feature Selection
Backward Feature Elimination (BFE) ﬁlter is used for feature
elimination.
BFE starts with maximum number feature in hand (in this
case it starts with three hundred features) and eliminate
features one by one in iterative manner.
At each iteration, resulting accuracy of the prediction is
evaluated for all combination of remaining attributes and
subset of attributes with the highest accuracy is propagated to
next iteration.
The subset with the best accuracy is chosen.

Introduction
Methodology
Conclusions
Regression Models
Feature Extraction
Feature extraction helps in reducing computational overhead which
may incurred due to use of complete input dimension.
Principle Component Analysis (PCA)
Factor Analysis (FA)
Independent Component Analysis (ICA)
Kernel PCA (kPCA)
Multidimensional Scaling (MDS)

Introduction
Methodology
Conclusions
Regression Models
Regression models
Regression/Prediction model tries to ﬁgure out the relationship
between input variables and output variable.
Linear regression (LReg)
Gaussian Process Regression (GPReg)
Multilayer perceptron (MLP)
Sequential Minimal Optimization Regression (SMOReg)

Introduction
Methodology
Conclusions
Feature Selection Results
Feature Extraction Results
Experimental results of feature selection technique
15.000
20.000
25.000
30.000
0.000
5.000
10.000
1 5 10 Optimal 300
Number of Selected Features
AverageRMSE
(a)
15.000
20.000
25.000
GPReg
LReg
0.000
5.000
10.000
1 5 10 Optimal 300
LReg
MLP
SMOReg
Number of Selected Features
Variance (b)
Figure: Experimental results of feature selection, comparison between the
regression models. (a) comparison using average RMSE (b) comparison
using variance.

Introduction
Methodology
Conclusions
Table: Experimental results for 10cv datasets prepared with distinct
random partitions of the complete dataset using feature selection
technique (Identiﬁcation of regression model) Note. Mean and variance (VAR) is
computed on 10 RMSE obtained.
Regression Reduced Number of Features
Model 1 5 10 Optimal 300
Mean VAR Mean VAR Mean VAR Mean VAR Mean VAR
GPReg 27.474 10.942 17.107 3.989 15.322 3.782 15.709 3.162 16.812 3.551
LReg 26.613 3.232 23.447 3.702 19.979 3.402 17.847 1.634 17.074 2.738
MLP 28.329 7.428 23.113 10.007 20.997 11.365 17.820 8.095 18.571 21.063
SMOReg 26.970 3.307 23.381 2.729 19.526 3.757 17.885 3.321 16.529 2.554

Introduction
Methodology
Conclusions
Experimental results of feature extraction technique
20
25
30
35
0
5
10
15
20
ICA PCA FA kPCA MDS
AverageRMSE
(a)
5
6
7
8
9
GPReg
0
1
2
3
4
5
ICA PCA FA kPCA MDS
LReg
MLP
SMOReg
(b)
Variance
Figure: Experimental results of feature extraction with reduced dimension
30, comparison between the regression models. (a) comparison using
average RMSE (b) comparison using variance.

Introduction
Methodology
Conclusions
Table: Experimental results for 10cv datasets prepared with distinct
random partitions of the complete dataset using feature selection
technique (Identiﬁcation of regression model) Note. Mean and variance (VAR) is
computed on 10 RMSE obtained.
Regression Reduced Number of Features
Model ICA PCA FA kPCA MDS
Mean VAR Mean VAR Mean VAR Mean VAR Mean VAR
GPReg 14.826 3.612 16.636 3.160 28.314 3.338 24.955 1.965 28.413 3.155
LReg 17.233 2.340 17.170 2.790 29.970 1.766 25.348 2.048 29.192 2.079
MLP 13.945 2.765 13.590 1.560 31.010 1.825 27.067 4.090 29.925 3.105
SMOReg 17.925 2.875 17.660 1.560 30.257 3.373 25.900 1.700 29.641 2.758

Introduction
Methodology
Conclusions
Conclusion
Future Work
Conclusion
Large number of input features predicting the rate of
dissolution is a complex problem.
Feature selection technique let us select most influencing
features among the available features without worsening the
performance.
Features extraction techniques provide a reduced set of new
features which performs better than when considering all the
features together.
We have analysed the performance of GPReg, LReg, MLP and
SMOReg.
Performance of GPReg is best which offers lowest average
RMSE and VAR with 10 selected features.
PCA used to reduce dimension to 30 offered best result using
MLP with lowest average RMSE and VAR.

Introduction
Methodology
Conclusions
Conclusion
Future Work
Future Work
Focus on the various types of stochastic feature selection
methods
Exploring diﬀerent other types of regression models.
Study on making use of ensemble of elementary regression.
Comparison of ensemble methods.

Introduction
Methodology
Conclusions
Thank You!
varun.kumar.ojha.@vsb.cz

Dimensionality Reduction and Prediction of the Protein Macromolecule Dissolution Prole

Recommended

Recommended

More Related Content

Similar to Dimensionality Reduction and Prediction of the Protein Macromolecule Dissolution Prole

Similar to Dimensionality Reduction and Prediction of the Protein Macromolecule Dissolution Prole (20)

More from Varun Ojha

More from Varun Ojha (12)

Recently uploaded

Recently uploaded (20)