SlideShare a Scribd company logo
Emerging NanoMaterials –
nanoQSAR FY17
Paul Harten
July 18, 2016
Assumptions
• Setting up and running the same experiment in the laboratory should get
the same results, time after time (within an error).
• The results of experiments, and how experiments are set up and run can
be described by a quantitative relationship.
• This relationship is a function 𝑦 = 𝑓 𝑥1, 𝑥2, … , 𝑥𝑚 , where y is the result of
the experiment and 𝑥1, …, 𝑥𝑚 are descriptors of the experiment. Every
time the values of the descriptors are the same, the result is the same.
• What that function looks like and what descriptors should be used are what
we are tying to find out.
2
Descriptors and Responses
• The descriptors of an experiment may be divided into:
o Properties of “pristine” material (e.g. surface charge, zeta potential);
o Properties of “weathered” or “aged” material (e.g. hydration);
o Parameters of experiment and assay increments (e.g. temperature,
nanomaterial concentration)
•The experimental responses may be results such as:
o The percentage of human lung cells that expire after 1 day
o The percentage of human lung cells that expire after 2 days
o Similar results for different cell types
3
Descriptors and Responses (cont.)
4
Pristine Weathered Experimental Responses
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 X19 X20 X21 X22 X23 Y1 Y2
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 X19 X20 X21 X22 X23 Y1 Y2
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 X19 X20 X21 X22 X23 Y1 Y2
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 X19 X20 X21 X22 X23 Y1 Y2
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 X19 X20 X21 X22 X23 Y1 Y2
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 X19 X20 X21 X22 X23 Y1 Y2
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 X19 X20 X21 X22 X23 Y1 Y2
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 X19 X20 X21 X22 X23 Y1 Y2
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 X19 X20 X21 X22 X23 Y1 Y2
. .
. .
. .
Descriptor and Response Relationship
• A row is generated for each experiment conducted, recording the values
the descriptors take on and the results of the experiment.
• If we assume a linear relationship between descriptors and the results,
the function becomes 𝑦 = 𝑓 𝑥1, 𝑥2, … , 𝑥𝑚 = 𝑏0 + 𝑏1𝑥1 + … + 𝑏𝑚𝑥𝑚
• The results of multiple experiments can be represented using the matrix
notation
𝑦 = 𝑋𝑏 + 𝑒
where 𝑋 has m columns of descriptors and n rows of experiments.
5
Partial Least Squares (PLS), y = b0 + b1 * x1 + e
6
NanoQSAR
• Select 80% of experimental results randomly to build a QSAR model
𝑅2 = 1 −
𝑦𝑎𝑐𝑡𝑢𝑎𝑙 − 𝑦𝑚𝑜𝑑𝑒𝑙
2
𝑦𝑎𝑐𝑡𝑢𝑎𝑙 − 𝑦𝑚𝑒𝑎𝑛
2
• How close to 1.0 reflects the quality of the model and the error terms
• With the remaining 20%, predict results
𝑄2
= 1 −
𝑦𝑎𝑐𝑡𝑢𝑎𝑙 − 𝑦𝑝𝑟𝑒𝑑𝑖𝑐𝑡
2
𝑦𝑎𝑐𝑡𝑢𝑎𝑙 − 𝑦𝑚𝑒𝑎𝑛
2
• In general, 𝑅2
≥ 𝑄2
7
Latent Structure of X (and Y)
• When there are correlations (collinearity) between the columns of 𝑋, the
calculated regression coefficients 𝑏 become unstable.
• Because of this, multivariate projection methods such as PLS (Projections
to Latent Structures) are increasingly being used in QSAR analysis.
• This method takes the projections of descriptors down to a reduced
dimensional hyperplane of descriptors.
• More stable calculated regression coefficients 𝑏 can be found using this
inherent latent structure of matrix 𝑋.
• Similar reduction of dimensions can be done for experimental results.
8
Latent Structure of X (and Y)
9
Many Separate Clusters
• Nature is found to organize experimental results in a clustered and
discontinuous way.
• How many clusters exist may be found using a k-means algorithm that starts
from n clusters, where n is the number of experimental results.
• Number of clusters are reduced each iteration by combining closest clusters.
•Also for each iteration, QSAR modeling is performed for all clusters that are
large enough, and how close the predicted values are to the actual values
𝑄2 is calculated.
• At the final step, the number of clusters with the best 𝑄2 is selected.
•If there are any clusters that are still not large enough for QSAR modeling,
new experimental data needs to be generated.
10
Many Separate Clusters (cont.)
11
Emerging NanoMaterials
• What cluster an emerging nanomaterial is most similar to can be
identified by including theoretical descriptors like SMILES strings, and the
x, y, z coordinates of different molecules in the nanostructure.
• The emerging nanomaterials can then be associated with the closest
cluster.
•Experimental results are predicted using the regression equation found for
that particular cluster:
𝑦 = 𝑏0 + 𝑏1𝑥1 + … + 𝑏𝑚𝑥𝑚
• Like before, if an emerging nanomaterial is found very far from any
existing cluster, new experimental data needs to be generated to fill that
hole in the database.
12

More Related Content

Similar to Enm fy17nano qsar

Module 1 sp
Module 1 spModule 1 sp
Module 1 sp
Vijaya79
 
A GENERALIZED SAMPLING THEOREM OVER GALOIS FIELD DOMAINS FOR EXPERIMENTAL DESIGN
A GENERALIZED SAMPLING THEOREM OVER GALOIS FIELD DOMAINS FOR EXPERIMENTAL DESIGNA GENERALIZED SAMPLING THEOREM OVER GALOIS FIELD DOMAINS FOR EXPERIMENTAL DESIGN
A GENERALIZED SAMPLING THEOREM OVER GALOIS FIELD DOMAINS FOR EXPERIMENTAL DESIGN
cscpconf
 
A Generalized Sampling Theorem Over Galois Field Domains for Experimental Des...
A Generalized Sampling Theorem Over Galois Field Domains for Experimental Des...A Generalized Sampling Theorem Over Galois Field Domains for Experimental Des...
A Generalized Sampling Theorem Over Galois Field Domains for Experimental Des...
csandit
 
R Packages for Time-Varying Networks and Extremal Dependence
R Packages for Time-Varying Networks and Extremal DependenceR Packages for Time-Varying Networks and Extremal Dependence
R Packages for Time-Varying Networks and Extremal Dependence
Work-Bench
 
Causality detection
Causality detectionCausality detection
Causality detection
Tushar Mehndiratta
 
MM-KBAC – Using Mixed Models to Adjust for Population Structure in a Rare-var...
MM-KBAC – Using Mixed Models to Adjust for Population Structure in a Rare-var...MM-KBAC – Using Mixed Models to Adjust for Population Structure in a Rare-var...
MM-KBAC – Using Mixed Models to Adjust for Population Structure in a Rare-var...
Golden Helix Inc
 
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...
ssuser2624f71
 
Dong Zhang's project
Dong Zhang's projectDong Zhang's project
Dong Zhang's projectDong Zhang
 
Bayesian Generalization Error and Real Log Canonical Threshold in Non-negativ...
Bayesian Generalization Error and Real Log Canonical Threshold in Non-negativ...Bayesian Generalization Error and Real Log Canonical Threshold in Non-negativ...
Bayesian Generalization Error and Real Log Canonical Threshold in Non-negativ...
Naoki Hayashi
 
Unit3
Unit3Unit3
Kinetic bands versus Bollinger Bands
Kinetic bands versus Bollinger  BandsKinetic bands versus Bollinger  Bands
Kinetic bands versus Bollinger Bands
Alexandru Daia
 
Av 738- Adaptive Filtering - Background Material
Av 738- Adaptive Filtering - Background MaterialAv 738- Adaptive Filtering - Background Material
Av 738- Adaptive Filtering - Background Material
Dr. Bilal Siddiqui, C.Eng., MIMechE, FRAeS
 
Paper study: Learning to solve circuit sat
Paper study: Learning to solve circuit satPaper study: Learning to solve circuit sat
Paper study: Learning to solve circuit sat
ChenYiHuang5
 
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...
Vahid Taslimitehrani
 
Probabilistic Logic Programming with Beta-Distributed Random Variables
Probabilistic Logic Programming with Beta-Distributed Random VariablesProbabilistic Logic Programming with Beta-Distributed Random Variables
Probabilistic Logic Programming with Beta-Distributed Random Variables
Federico Cerutti
 
On clusteredsteinertree slide-ver 1.1
On clusteredsteinertree slide-ver 1.1On clusteredsteinertree slide-ver 1.1
On clusteredsteinertree slide-ver 1.1
VitAnhNguyn94
 
Data Driven Choice of Threshold in Cepstrum Based Spectrum Estimate
Data Driven Choice of Threshold in Cepstrum Based Spectrum EstimateData Driven Choice of Threshold in Cepstrum Based Spectrum Estimate
Data Driven Choice of Threshold in Cepstrum Based Spectrum Estimate
sipij
 
【博士論文発表会】パラメータ制約付き特異モデルの統計的学習理論
【博士論文発表会】パラメータ制約付き特異モデルの統計的学習理論【博士論文発表会】パラメータ制約付き特異モデルの統計的学習理論
【博士論文発表会】パラメータ制約付き特異モデルの統計的学習理論
Naoki Hayashi
 
Metody logiczne w analizie danych
Metody logiczne w analizie danych Metody logiczne w analizie danych
Metody logiczne w analizie danych
Data Science Warsaw
 

Similar to Enm fy17nano qsar (20)

Module 1 sp
Module 1 spModule 1 sp
Module 1 sp
 
A GENERALIZED SAMPLING THEOREM OVER GALOIS FIELD DOMAINS FOR EXPERIMENTAL DESIGN
A GENERALIZED SAMPLING THEOREM OVER GALOIS FIELD DOMAINS FOR EXPERIMENTAL DESIGNA GENERALIZED SAMPLING THEOREM OVER GALOIS FIELD DOMAINS FOR EXPERIMENTAL DESIGN
A GENERALIZED SAMPLING THEOREM OVER GALOIS FIELD DOMAINS FOR EXPERIMENTAL DESIGN
 
A Generalized Sampling Theorem Over Galois Field Domains for Experimental Des...
A Generalized Sampling Theorem Over Galois Field Domains for Experimental Des...A Generalized Sampling Theorem Over Galois Field Domains for Experimental Des...
A Generalized Sampling Theorem Over Galois Field Domains for Experimental Des...
 
R Packages for Time-Varying Networks and Extremal Dependence
R Packages for Time-Varying Networks and Extremal DependenceR Packages for Time-Varying Networks and Extremal Dependence
R Packages for Time-Varying Networks and Extremal Dependence
 
Causality detection
Causality detectionCausality detection
Causality detection
 
MM-KBAC – Using Mixed Models to Adjust for Population Structure in a Rare-var...
MM-KBAC – Using Mixed Models to Adjust for Population Structure in a Rare-var...MM-KBAC – Using Mixed Models to Adjust for Population Structure in a Rare-var...
MM-KBAC – Using Mixed Models to Adjust for Population Structure in a Rare-var...
 
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...
 
Dong Zhang's project
Dong Zhang's projectDong Zhang's project
Dong Zhang's project
 
Bayesian Generalization Error and Real Log Canonical Threshold in Non-negativ...
Bayesian Generalization Error and Real Log Canonical Threshold in Non-negativ...Bayesian Generalization Error and Real Log Canonical Threshold in Non-negativ...
Bayesian Generalization Error and Real Log Canonical Threshold in Non-negativ...
 
Unit3
Unit3Unit3
Unit3
 
Kinetic bands versus Bollinger Bands
Kinetic bands versus Bollinger  BandsKinetic bands versus Bollinger  Bands
Kinetic bands versus Bollinger Bands
 
Canonical correlation
Canonical correlationCanonical correlation
Canonical correlation
 
Av 738- Adaptive Filtering - Background Material
Av 738- Adaptive Filtering - Background MaterialAv 738- Adaptive Filtering - Background Material
Av 738- Adaptive Filtering - Background Material
 
Paper study: Learning to solve circuit sat
Paper study: Learning to solve circuit satPaper study: Learning to solve circuit sat
Paper study: Learning to solve circuit sat
 
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...
 
Probabilistic Logic Programming with Beta-Distributed Random Variables
Probabilistic Logic Programming with Beta-Distributed Random VariablesProbabilistic Logic Programming with Beta-Distributed Random Variables
Probabilistic Logic Programming with Beta-Distributed Random Variables
 
On clusteredsteinertree slide-ver 1.1
On clusteredsteinertree slide-ver 1.1On clusteredsteinertree slide-ver 1.1
On clusteredsteinertree slide-ver 1.1
 
Data Driven Choice of Threshold in Cepstrum Based Spectrum Estimate
Data Driven Choice of Threshold in Cepstrum Based Spectrum EstimateData Driven Choice of Threshold in Cepstrum Based Spectrum Estimate
Data Driven Choice of Threshold in Cepstrum Based Spectrum Estimate
 
【博士論文発表会】パラメータ制約付き特異モデルの統計的学習理論
【博士論文発表会】パラメータ制約付き特異モデルの統計的学習理論【博士論文発表会】パラメータ制約付き特異モデルの統計的学習理論
【博士論文発表会】パラメータ制約付き特異モデルの統計的学習理論
 
Metody logiczne w analizie danych
Metody logiczne w analizie danych Metody logiczne w analizie danych
Metody logiczne w analizie danych
 

Recently uploaded

H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
H.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdfH.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdf
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
MLILAB
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
Kamal Acharya
 
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
WENKENLI1
 
English lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdfEnglish lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdf
BrazilAccount1
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
VENKATESHvenky89705
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Sreedhar Chowdam
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
Kamal Acharya
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
AJAYKUMARPUND1
 
The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdf
Pipe Restoration Solutions
 
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang,  ICLR 2024, MLILAB, KAIST AI.pdfJ.Yang,  ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
MLILAB
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
bakpo1
 
DESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docxDESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docx
FluxPrime1
 
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
ydteq
 
Architectural Portfolio Sean Lockwood
Architectural Portfolio Sean LockwoodArchitectural Portfolio Sean Lockwood
Architectural Portfolio Sean Lockwood
seandesed
 
Runway Orientation Based on the Wind Rose Diagram.pptx
Runway Orientation Based on the Wind Rose Diagram.pptxRunway Orientation Based on the Wind Rose Diagram.pptx
Runway Orientation Based on the Wind Rose Diagram.pptx
SupreethSP4
 
Hierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power SystemHierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power System
Kerry Sado
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
Amil Baba Dawood bangali
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
thanhdowork
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Teleport Manpower Consultant
 
AP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specificAP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specific
BrazilAccount1
 

Recently uploaded (20)

H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
H.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdfH.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdf
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
 
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
 
English lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdfEnglish lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdf
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
 
The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdf
 
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang,  ICLR 2024, MLILAB, KAIST AI.pdfJ.Yang,  ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
 
DESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docxDESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docx
 
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
 
Architectural Portfolio Sean Lockwood
Architectural Portfolio Sean LockwoodArchitectural Portfolio Sean Lockwood
Architectural Portfolio Sean Lockwood
 
Runway Orientation Based on the Wind Rose Diagram.pptx
Runway Orientation Based on the Wind Rose Diagram.pptxRunway Orientation Based on the Wind Rose Diagram.pptx
Runway Orientation Based on the Wind Rose Diagram.pptx
 
Hierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power SystemHierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power System
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
 
AP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specificAP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specific
 

Enm fy17nano qsar

  • 1. Emerging NanoMaterials – nanoQSAR FY17 Paul Harten July 18, 2016
  • 2. Assumptions • Setting up and running the same experiment in the laboratory should get the same results, time after time (within an error). • The results of experiments, and how experiments are set up and run can be described by a quantitative relationship. • This relationship is a function 𝑦 = 𝑓 𝑥1, 𝑥2, … , 𝑥𝑚 , where y is the result of the experiment and 𝑥1, …, 𝑥𝑚 are descriptors of the experiment. Every time the values of the descriptors are the same, the result is the same. • What that function looks like and what descriptors should be used are what we are tying to find out. 2
  • 3. Descriptors and Responses • The descriptors of an experiment may be divided into: o Properties of “pristine” material (e.g. surface charge, zeta potential); o Properties of “weathered” or “aged” material (e.g. hydration); o Parameters of experiment and assay increments (e.g. temperature, nanomaterial concentration) •The experimental responses may be results such as: o The percentage of human lung cells that expire after 1 day o The percentage of human lung cells that expire after 2 days o Similar results for different cell types 3
  • 4. Descriptors and Responses (cont.) 4 Pristine Weathered Experimental Responses X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 X19 X20 X21 X22 X23 Y1 Y2 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 X19 X20 X21 X22 X23 Y1 Y2 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 X19 X20 X21 X22 X23 Y1 Y2 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 X19 X20 X21 X22 X23 Y1 Y2 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 X19 X20 X21 X22 X23 Y1 Y2 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 X19 X20 X21 X22 X23 Y1 Y2 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 X19 X20 X21 X22 X23 Y1 Y2 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 X19 X20 X21 X22 X23 Y1 Y2 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 X19 X20 X21 X22 X23 Y1 Y2 . . . . . .
  • 5. Descriptor and Response Relationship • A row is generated for each experiment conducted, recording the values the descriptors take on and the results of the experiment. • If we assume a linear relationship between descriptors and the results, the function becomes 𝑦 = 𝑓 𝑥1, 𝑥2, … , 𝑥𝑚 = 𝑏0 + 𝑏1𝑥1 + … + 𝑏𝑚𝑥𝑚 • The results of multiple experiments can be represented using the matrix notation 𝑦 = 𝑋𝑏 + 𝑒 where 𝑋 has m columns of descriptors and n rows of experiments. 5
  • 6. Partial Least Squares (PLS), y = b0 + b1 * x1 + e 6
  • 7. NanoQSAR • Select 80% of experimental results randomly to build a QSAR model 𝑅2 = 1 − 𝑦𝑎𝑐𝑡𝑢𝑎𝑙 − 𝑦𝑚𝑜𝑑𝑒𝑙 2 𝑦𝑎𝑐𝑡𝑢𝑎𝑙 − 𝑦𝑚𝑒𝑎𝑛 2 • How close to 1.0 reflects the quality of the model and the error terms • With the remaining 20%, predict results 𝑄2 = 1 − 𝑦𝑎𝑐𝑡𝑢𝑎𝑙 − 𝑦𝑝𝑟𝑒𝑑𝑖𝑐𝑡 2 𝑦𝑎𝑐𝑡𝑢𝑎𝑙 − 𝑦𝑚𝑒𝑎𝑛 2 • In general, 𝑅2 ≥ 𝑄2 7
  • 8. Latent Structure of X (and Y) • When there are correlations (collinearity) between the columns of 𝑋, the calculated regression coefficients 𝑏 become unstable. • Because of this, multivariate projection methods such as PLS (Projections to Latent Structures) are increasingly being used in QSAR analysis. • This method takes the projections of descriptors down to a reduced dimensional hyperplane of descriptors. • More stable calculated regression coefficients 𝑏 can be found using this inherent latent structure of matrix 𝑋. • Similar reduction of dimensions can be done for experimental results. 8
  • 9. Latent Structure of X (and Y) 9
  • 10. Many Separate Clusters • Nature is found to organize experimental results in a clustered and discontinuous way. • How many clusters exist may be found using a k-means algorithm that starts from n clusters, where n is the number of experimental results. • Number of clusters are reduced each iteration by combining closest clusters. •Also for each iteration, QSAR modeling is performed for all clusters that are large enough, and how close the predicted values are to the actual values 𝑄2 is calculated. • At the final step, the number of clusters with the best 𝑄2 is selected. •If there are any clusters that are still not large enough for QSAR modeling, new experimental data needs to be generated. 10
  • 12. Emerging NanoMaterials • What cluster an emerging nanomaterial is most similar to can be identified by including theoretical descriptors like SMILES strings, and the x, y, z coordinates of different molecules in the nanostructure. • The emerging nanomaterials can then be associated with the closest cluster. •Experimental results are predicted using the regression equation found for that particular cluster: 𝑦 = 𝑏0 + 𝑏1𝑥1 + … + 𝑏𝑚𝑥𝑚 • Like before, if an emerging nanomaterial is found very far from any existing cluster, new experimental data needs to be generated to fill that hole in the database. 12