Enm fy17nano qsar

Emerging NanoMaterials –
nanoQSAR FY17
Paul Harten
July 18, 2016

Assumptions
• Setting up and running the same experiment in the laboratory should get
the same results, time after time (within an error).
• The results of experiments, and how experiments are set up and run can
be described by a quantitative relationship.
• This relationship is a function 𝑦 = 𝑓 𝑥1, 𝑥2, … , 𝑥𝑚 , where y is the result of
the experiment and 𝑥1, …, 𝑥𝑚 are descriptors of the experiment. Every
time the values of the descriptors are the same, the result is the same.
• What that function looks like and what descriptors should be used are what
we are tying to find out.
2

Descriptors and Responses
• The descriptors of an experiment may be divided into:
o Properties of “pristine” material (e.g. surface charge, zeta potential);
o Properties of “weathered” or “aged” material (e.g. hydration);
o Parameters of experiment and assay increments (e.g. temperature,
nanomaterial concentration)
•The experimental responses may be results such as:
o The percentage of human lung cells that expire after 1 day
o The percentage of human lung cells that expire after 2 days
o Similar results for different cell types
3

Descriptors and Responses (cont.)
4
Pristine Weathered Experimental Responses
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 X19 X20 X21 X22 X23 Y1 Y2
. .
. .
. .

Descriptor and Response Relationship
• A row is generated for each experiment conducted, recording the values
the descriptors take on and the results of the experiment.
• If we assume a linear relationship between descriptors and the results,
the function becomes 𝑦 = 𝑓 𝑥1, 𝑥2, … , 𝑥𝑚 = 𝑏0 + 𝑏1𝑥1 + … + 𝑏𝑚𝑥𝑚
• The results of multiple experiments can be represented using the matrix
notation
𝑦 = 𝑋𝑏 + 𝑒
where 𝑋 has m columns of descriptors and n rows of experiments.
5

Partial Least Squares (PLS), y = b0 + b1 * x1 + e
6

NanoQSAR
• Select 80% of experimental results randomly to build a QSAR model
𝑅2 = 1 −
𝑦𝑎𝑐𝑡𝑢𝑎𝑙 − 𝑦𝑚𝑜𝑑𝑒𝑙
2
𝑦𝑎𝑐𝑡𝑢𝑎𝑙 − 𝑦𝑚𝑒𝑎𝑛
2
• How close to 1.0 reflects the quality of the model and the error terms
• With the remaining 20%, predict results
𝑄2
= 1 −
𝑦𝑎𝑐𝑡𝑢𝑎𝑙 − 𝑦𝑝𝑟𝑒𝑑𝑖𝑐𝑡
2
𝑦𝑎𝑐𝑡𝑢𝑎𝑙 − 𝑦𝑚𝑒𝑎𝑛
2
• In general, 𝑅2
≥ 𝑄2
7

Latent Structure of X (and Y)
• When there are correlations (collinearity) between the columns of 𝑋, the
calculated regression coefficients 𝑏 become unstable.
• Because of this, multivariate projection methods such as PLS (Projections
to Latent Structures) are increasingly being used in QSAR analysis.
• This method takes the projections of descriptors down to a reduced
dimensional hyperplane of descriptors.
• More stable calculated regression coefficients 𝑏 can be found using this
inherent latent structure of matrix 𝑋.
• Similar reduction of dimensions can be done for experimental results.
8

Latent Structure of X (and Y)
9

Many Separate Clusters
• Nature is found to organize experimental results in a clustered and
discontinuous way.
• How many clusters exist may be found using a k-means algorithm that starts
from n clusters, where n is the number of experimental results.
• Number of clusters are reduced each iteration by combining closest clusters.
•Also for each iteration, QSAR modeling is performed for all clusters that are
large enough, and how close the predicted values are to the actual values
𝑄2 is calculated.
• At the final step, the number of clusters with the best 𝑄2 is selected.
•If there are any clusters that are still not large enough for QSAR modeling,
new experimental data needs to be generated.
10

Many Separate Clusters (cont.)
11

Emerging NanoMaterials
• What cluster an emerging nanomaterial is most similar to can be
identified by including theoretical descriptors like SMILES strings, and the
x, y, z coordinates of different molecules in the nanostructure.
• The emerging nanomaterials can then be associated with the closest
cluster.
•Experimental results are predicted using the regression equation found for
that particular cluster:
𝑦 = 𝑏0 + 𝑏1𝑥1 + … + 𝑏𝑚𝑥𝑚
• Like before, if an emerging nanomaterial is found very far from any
existing cluster, new experimental data needs to be generated to fill that
hole in the database.
12

Enm fy17nano qsar

Recommended

Recommended

More Related Content

Similar to Enm fy17nano qsar

Similar to Enm fy17nano qsar (20)

Recently uploaded

Recently uploaded (20)

Enm fy17nano qsar